# 概述

特征提取 (feature extraction) 和特征选择 (feature selection) 是多特征处理的两种方式。其中，特征选择是在众多特征中选择出有用的特征，将无用特征直接抛弃。而特征提取则认为所有特征都有其存在的价值，需要通过某些算法找到特征的最佳组合，以反映多个特征背后的内在特征。

# 特征提取

特征提取的主要作用包括：

可视化 (visualization)
数据压缩 (data compression)
消除噪点 (noise removal)

特征提取的基本算法包括：

无监督算法
- 主成分分析 (Principal Component Analysis, PCA)
- 非负矩阵分解 (Nonnegative Matrix Factorization, NMF)
- 独立分量分析 (Independent Component Analysis, ICA)
有监督算法
- 线性判别分析 (Linear Discriminant Analysis, LDA)

两种算法的特征提取过程实际上是找到了原有特征的线性组合。

# PCA

主成分分析 (Principal Component Analysis, PCA) 是最基本、最常用的无监督特征提取方法。其想法是将高维度样本点有损压缩降低至低维度。在无监督情况下，我们需要最小化信息损失以达到最好的效果。而在有监督的情况下，则需要最大化类间距离 (LDA).

# MF

# LDA

线性判别分析 (Linear Discriminant Analysis, LDA) 是有监督的分类方法。

设输入为 $d$ 维样本点 $\{x_1, x_2, \dots, x_N\}$ . 其可以分为两类 $C_1,C_2$ , 数目分别为 $n_1,n_2$ . 我们希望寻找一个投影向量 (projection vector) $\theta \in \mathbb{R}^{d \times 1}$ , 以将各点 $x_i$ 映射到一条直线 $y_i = \theta x_i$ 上，使得映射后类内间距最小，类间间距最大。

于是，构造类内均值 $\mu_i = \frac{1}{n_i}\sum_{x \in C_i}x$ , 则投影后的类内均值为 $\tilde{\mu}_i = \frac{1}{n_i}\sum_{x\in C_i}y =\frac{1}{n_i} \sum_{x \in C_i}\theta^\top x = \theta^\top \mu_i$ .

构造类内散度矩阵 (Within-class scatter) 为

$S_w := \sum_{i=1,2}\sum_{x \in C_i}(x-\mu_i)(x-\mu_i)^\top \in \mathbb{R}^{d \times d}$

类间距离可以直接采用两类的均值进行刻画，于是构造类间散度矩阵 (Between-class scatter) 为

$S_b = (\mu_1-\mu_2)(\mu_1-\mu_2)^\top$

则优化目标为

$\left\{\begin{aligned} & \max J_1(\theta) = \theta^\top S_b \theta \\ & \min J_2(\theta) = \theta^\top S_w \theta \end{aligned}\right.$

注意到 $\theta$ 的长度本身不影响投影向量，仅 $\theta$ 的长度与之相关。因此不妨设 $\theta^\top S_w\theta = 1$ , 则优化目标转化为

$\min_\theta -\theta^\top S_b \theta$

采用拉格朗日乘子法求极值，令 $\lambda$ 为拉格朗日乘子，则转化为

$\min J(\theta) = -\theta^\top S_b \theta+\lambda(\theta^\top S_w \theta-1)$

求偏导得

$\frac{\partial J}{\partial \theta} = -2 S_b\theta + 2\lambda S_w \theta = 0$

即

$S_b \theta = \lambda S_w \theta$

注意到，此处的拉格朗日乘子法得到的根一定对应最小值，因为我也不知道。

# 特征选择

# 概述

我们首先给出一个特征选择的定义：

对一系列特征 $X=\{X_1,X_2,\dots,x_D\}$ 以及目标变量 $Y$ , 寻找一个能够实现 $Y$ 上最佳分类的最小集合 $\mathcal S$ 的过程称为特征选择。

特征选择的主要算法包括包装法 (wrapper methods), 过滤法 (filter methods) 和嵌入法 (embedded methods).

特征处理与 LDA 算法

# 概述

# 特征提取

# PCA

# MF

# LDA

# 特征选择

# 概述

# Warpper Method

# Filter Method

# Embedded Method

# 概述

# 特征提取

# PCA

# MF

# LDA

# 特征选择

# 概述

# Warpper Method

# Filter Method

# Embedded Method

机器学习实验报告(2)：多变量线性回归

OpenGL 学习记录