# Experiment 5: SVM

This is the report of Experiment 5: SVM.

# Linear SVM - basic

# Purpose

In this part, we will use a basic linear SVM to classify a group of data. The train set and test set are given as file training_1.txt and test_1.txt.

# Procedure

A basic linear SVM with regular term can be formulated as

$\begin{aligned} & \min && \frac{1}{2} \|w\|^2 + C \sum_{i=1}^n \xi_i \\ & s.t. && y^{(i)}\left(w^\top \boldsymbol{x}^{(i)} + b\right) \geq 1-\xi_i, & \forall i = 1,2,\dots,n \\ & && \xi_i \geq 0, & \forall i = 1,2,\dots,n \end{aligned}$

It's Lagrange duality problem is

$\begin{aligned} & \min && \frac{1}{2} \|w\|^2 + C \sum_{i=1}^n \xi_i \\ & s.t. && 0 \leq \alpha_i \leq C, & \forall i = 1,2,\dots,n \\ & && \sum_{i=1}^n \alpha_iy^{(i)} = 0 \end{aligned}$

After calculated that, we can get the parameters $w$ and $b$ by

$\left\{\begin{aligned} & w = \sum_{i=1}^n \alpha_i y^{(i)}\boldsymbol{x}^{(i)} \\ & b = y(s) - \sum_{i=1}^n \alpha_i y^{(i)} \langle \boldsymbol{x}^{(i)}, \boldsymbol{x}^{(s)}\rangle, & \forall s \in \mathcal S \end{aligned}\right.$

where $\mathcal{S} := \{i|\alpha_i \neq 0\}$ is the support set.

Then, we can draw the decision boundary and use the test set to examine whether the SVM reliable.

# Results & Answers

The train set and decision boundary is as follows.

Then, I draw the line with $C=0.01, C=1$ and $C = 100$ on the same diagram. But there are no significant difference.

The result of this classifier on test set is

	success: 500
	err: 0

There's no misclassified. That's amazing!

# Linear SVM for Handwritten Digit Recognition

# Purpose

Now, we will train a linear SVM on a simplified MNIST dataset, which only have the figure of 0 and 1. We need to classify them, and test whether the SVM classifier is reliable or not.

# Procedure

The main procedure is like the first part, but we should import the data and extract the features first.

We use the local binary pattern algorithm (LBP) to extract the feature.

Another different thing is that we can't directly visualize the data, so we evaluate the SVM classifier with error rate. And, we can print the misclassified case.

# Result & Answers

# without regularization

Without regularization term, we can get the classification result as follows.

train suc: 12665
train err: 0
test suc: 2112
test err: 3

Only 3 figure are misclassified. One of them is

Maybe its feature is not so clear.

# with different regularization term

The order of magnitude of alpha is about 1e-6 . So I chose the regularization parameter $C$ from 1e-5 to 1e-9 . The results are as follows.

$C$	error rate on train set	error rate on test set
`+inf`	$0$ (0/12665)	$0.1354\%$ (3/2115)
`1e-4`	$43.6163\%$ (5524/12665)	$44.3972\%$ (939/2115)
`1e-5`	$43.6163\%$ (5524/12665)	$44.3972\%$ (939/2115)
`3e-6`	$12.0490\%$ (1526/12665)	$10.9693\%$ (232/2115)
`2e-6`	$5.9850\%$ (758/12665)	$5.5792\%$ (118/2115)
`1e-6`	$1.3502\%$ (171/12665)	$1.5130\%$ (32/2115)
`1e-7`	$0.2448\%$ (31/12665)	$0.2709\%$ (6/2115)
`1e-8`	$0.3079\%$ (39/12665)	$0.1891\%$ (4/2115)
`1e-9`	$0.4816\%$ (61/12665)	$0.3310\%$ (7/2115)

I don't know why the error rate change like this. (That's why I out of due...)

# Non-Linear SVM

# Purpose

In this part, we will construct a non-linear SVM for classifying. The dataset given is training_3.txt.

# Procedure

Unlike linear SVM, non-linear SVM using a kernel $\kappa(\boldsymbol{x}_i, \boldsymbol{x}_j)$ to construct a mapping from low dimensional space to high dimensional space. In this experiment, we use the Radial Basis Function (RBF) kernel. The kernel has the formula:

$\kappa(\boldsymbol{x}^{(i)}, \boldsymbol{x}^{(j)}) = \exp\left(-\gamma \|\boldsymbol{x}^{(i)} - \boldsymbol{x}^{(j)}\|\right)$

where $\gamma > 0$ is a hyperparameter.

Because we change the kernel, we also need to change the decision function $f$ . Now, decision function $f$ can be calculated as

$f(x) = \sum_{i=1}^n\alpha_i y^{(i)} \kappa(\boldsymbol{x}^{(i)}, \boldsymbol{x}^{(j)}) + b$

where $b = y(s) - \sum_{i=1}^n \alpha_i y^{(i)} \kappa(\boldsymbol{x}^{(i)}, \boldsymbol{x}^{(s)}),\; \forall s \in \mathcal S$ and $\mathcal{S}$ is the support set.

# Results and Answers

First, let's see the dataset without any decision boundary.

It's easy to see that there is not a perfect linear boundary dividing all the data points. So we need a non-linear boundary.

With different $\gamma$ , the results are as follows.

# $\gamma = 1$

# $\gamma = 10$

# $\gamma = 100$

# $\gamma = 1000$

We can see that $\gamma = 10$ may be the best choice.

# Experiment 5: SVM

# Linear SVM - basic

# Purpose

# Procedure

# Results & Answers

# Linear SVM for Handwritten Digit Recognition

# Purpose

# Procedure

# Result & Answers

# without regularization

# with different regularization term

# Non-Linear SVM

# Purpose

# Procedure

# Results and Answers

# γ=1\gamma = 1γ=1

# γ=10\gamma = 10γ=10

# γ=100\gamma = 100γ=100

# γ=1000\gamma = 1000γ=1000

机器学习实验报告(9)：决策树

机器学习实验报告(10)：随机森林

# $\gamma = 1$

# $\gamma = 10$

# $\gamma = 100$

# $\gamma = 1000$