# Experiment 5: SVM

This is the report of Experiment 5: SVM.

# Linear SVM - basic

# Purpose

In this part, we will use a basic linear SVM to classify a group of data. The train set and test set are given as file training_1.txt and test_1.txt.

# Procedure

A basic linear SVM with regular term can be formulated as

min12w2+Ci=1nξis.t.y(i)(wx(i)+b)1ξi,i=1,2,,nξi0,i=1,2,,n\begin{aligned} & \min && \frac{1}{2} \|w\|^2 + C \sum_{i=1}^n \xi_i \\ & s.t. && y^{(i)}\left(w^\top \boldsymbol{x}^{(i)} + b\right) \geq 1-\xi_i, & \forall i = 1,2,\dots,n \\ & && \xi_i \geq 0, & \forall i = 1,2,\dots,n \end{aligned}

It's Lagrange duality problem is

min12w2+Ci=1nξis.t.0αiC,i=1,2,,ni=1nαiy(i)=0\begin{aligned} & \min && \frac{1}{2} \|w\|^2 + C \sum_{i=1}^n \xi_i \\ & s.t. && 0 \leq \alpha_i \leq C, & \forall i = 1,2,\dots,n \\ & && \sum_{i=1}^n \alpha_iy^{(i)} = 0 \end{aligned}

After calculated that, we can get the parameters ww and bb by

{w=i=1nαiy(i)x(i)b=y(s)i=1nαiy(i)x(i),x(s),sS\left\{\begin{aligned} & w = \sum_{i=1}^n \alpha_i y^{(i)}\boldsymbol{x}^{(i)} \\ & b = y(s) - \sum_{i=1}^n \alpha_i y^{(i)} \langle \boldsymbol{x}^{(i)}, \boldsymbol{x}^{(s)}\rangle, & \forall s \in \mathcal S \end{aligned}\right.

where S:={iαi0}\mathcal{S} := \{i|\alpha_i \neq 0\} is the support set.

Then, we can draw the decision boundary and use the test set to examine whether the SVM reliable.

# Results & Answers

The train set and decision boundary is as follows.

<img src="../../../images/machineLearning/MLExp/Exp5/C=100.jpg" style="zoom:67%;" />

Then, I draw the line with C=0.01,C=1C=0.01, C=1 and C=100C = 100 on the same diagram. But there are no significant difference.

<img src="../../../images/machineLearning/MLExp/Exp5/2.jpg" style="zoom:67%;" />

The result of this classifier on test set is

h
success: 500
err: 0

There's no misclassified. That's amazing!

# Linear SVM for Handwritten Digit Recognition

# Purpose

Now, we will train a linear SVM on a simplified MNIST dataset, which only have the figure of 0 and 1. We need to classify them, and test whether the SVM classifier is reliable or not.

# Procedure

The main procedure is like the first part, but we should import the data and extract the features first.

We use the local binary pattern algorithm (LBP) to extract the feature.

Another different thing is that we can't directly visualize the data, so we evaluate the SVM classifier with error rate. And, we can print the misclassified case.

# Result & Answers

# without regularization

Without regularization term, we can get the classification result as follows.

train suc: 12665
train err: 0
test suc: 2112
test err: 3

Only 3 figure are misclassified. One of them is

<img src="../../../images/machineLearning/MLExp/Exp5/err%20case1.jpg" style="zoom:50%;" />

Maybe its feature is not so clear.

# with different regularization term

The order of magnitude of alpha is about 1e-6 . So I chose the regularization parameter CC from 1e-5 to 1e-9 . The results are as follows.

CCerror rate on train seterror rate on test set
+inf00 (0/12665)0.1354%0.1354\% (3/2115)
1e-443.6163%43.6163\% (5524/12665)44.3972%44.3972\% (939/2115)
1e-543.6163%43.6163\% (5524/12665)44.3972%44.3972\% (939/2115)
3e-612.0490%12.0490\% (1526/12665)10.9693%10.9693\% (232/2115)
2e-65.9850%5.9850\% (758/12665)5.5792%5.5792\% (118/2115)
1e-61.3502%1.3502\% (171/12665)1.5130%1.5130\% (32/2115)
1e-70.2448%0.2448\% (31/12665)0.2709%0.2709\% (6/2115)
1e-80.3079%0.3079\% (39/12665)0.1891%0.1891\% (4/2115)
1e-90.4816%0.4816\% (61/12665)0.3310%0.3310\% (7/2115)

I don't know why the error rate change like this. (That's why I out of due...)

# Non-Linear SVM

# Purpose

In this part, we will construct a non-linear SVM for classifying. The dataset given is training_3.txt.

# Procedure

Unlike linear SVM, non-linear SVM using a kernel κ(xi,xj)\kappa(\boldsymbol{x}_i, \boldsymbol{x}_j) to construct a mapping from low dimensional space to high dimensional space. In this experiment, we use the Radial Basis Function (RBF) kernel. The kernel has the formula:

κ(x(i),x(j))=exp(γx(i)x(j))\kappa(\boldsymbol{x}^{(i)}, \boldsymbol{x}^{(j)}) = \exp\left(-\gamma \|\boldsymbol{x}^{(i)} - \boldsymbol{x}^{(j)}\|\right)

where γ>0\gamma > 0 is a hyperparameter.

Because we change the kernel, we also need to change the decision function ff. Now, decision function ff can be calculated as

f(x)=i=1nαiy(i)κ(x(i),x(j))+bf(x) = \sum_{i=1}^n\alpha_i y^{(i)} \kappa(\boldsymbol{x}^{(i)}, \boldsymbol{x}^{(j)}) + b

where b=y(s)i=1nαiy(i)κ(x(i),x(s)),sSb = y(s) - \sum_{i=1}^n \alpha_i y^{(i)} \kappa(\boldsymbol{x}^{(i)}, \boldsymbol{x}^{(s)}),\; \forall s \in \mathcal S and S\mathcal{S} is the support set.

# Results and Answers

First, let's see the dataset without any decision boundary.

<img src="../../../images/machineLearning/MLExp/Exp5/dataset.jpg" style="zoom:67%;" />

It's easy to see that there is not a perfect linear boundary dividing all the data points. So we need a non-linear boundary.

With different γ\gamma, the results are as follows.

# γ=1\gamma = 1

<img src="../../../images/machineLearning/MLExp/Exp5/gamma=1.jpg" style="zoom:67%;" />

# γ=10\gamma = 10

<img src="../../../images/machineLearning/MLExp/Exp5/gamma=10.jpg" style="zoom:67%;" />

# γ=100\gamma = 100

<img src="../../../images/machineLearning/MLExp/Exp5/gamma=100.jpg" style="zoom:67%;" />

# γ=1000\gamma = 1000

<img src="../../../images/machineLearning/MLExp/Exp5/gamma=1000.jpg" style="zoom:67%;" />

We can see that γ=10\gamma = 10 may be the best choice.