探究正则化系数应对欠拟合、过拟合的效果。

# Experiment 6: Regularization

This is the report of Experiment 6: Regularization.

# Purpose

In this experiment, we want to find the purpose of regularization term on underfitting and overfitting.

The first group of data is a regression problem with single argument. We will try fitting a 5th-order polynomial to a data set of only 7 points.

The second group of data is a classification problem with logistic regression. We also want to find the influence of regularization parameter on underfitting and overfitting.

# Hypothesis

For the first sub-experiment, our hypothesis will be

hθ(x)=θ0+i=15θixih_\theta(x) = \theta_0 + \sum_{i=1}^5\theta_ix^i

For the second sub-experiment, our hypothesis will be

hθ(x)=g(θx)=11+eθx=P(y=1x;θ)h_\theta(x) = g(\theta^\top x) = \frac{1}{1+e^{-\theta^\top x}} = \mathbb{P}(y=1\big|x;\theta)

# Procedure

# Regularized Linear Regression

With the hypothesis

hθ(x)=θ0+i=15θixih_\theta(x) = \theta_0 + \sum_{i=1}^5\theta_ix^i

we can get the cost function with respect to θ\theta:

J(θ)=12m(i=1m(hθ(x(i))y(i))2+λj=1nθj2)J(\theta) = \frac{1}{2m}\left(\sum_{i=1}^m\left(h_{\theta}(x^{(i)})-y^{(i)}\right)^2 + \lambda \sum_{j=1}^n \theta_j^2 \right)

where λ\lambda is the regularization parameter.

Using the normal equations

θ=(XX+λdiag{0,1,,1}n+1)1Xy\theta = \left(X^\top X + \lambda \text{diag}\left\{0,1,\dots,1\right\}_{n+1}\right)^{-1}X^\top y

When λ=0,1,10\lambda = 0, 1, 10, we got the figure as follows:

# Regularized Logistic Regression

With the hypothesis

hθ(x)=g(θx)=11+eθx=P(y=1x;θ)h_\theta(x) = g(\theta^\top x) = \frac{1}{1+e^{-\theta^\top x}} = \mathbb{P}(y=1\big|x;\theta)

the cost function in regularized logistic regression is

J(θ)=1mi=1m(y(i)log(hθ(x(i))+(1y(i))log(1hθ(x(i)))))+λ2mj=1nθj2J(\theta) = -\frac{1}{m}\sum_{i=1}^m\left(y^{(i)}\log\left(h_{\theta}(x^{(i)}) + (1-y^{(i)})\log\left(1-h_{\theta}(x^{(i)})\right)\right)\right) + \frac{\lambda}{2m}\sum_{j=1}^n\theta_j^2

The Newton's Method rule is

θ(t+1)=θ(t)H1θJ\theta^{(t+1)} = \theta^{(t)} - H^{-1}\nabla_\theta J

With 10 iterations, we get the results as follow:

# Answer of the questions

# Question 1

From looking at these graphs(in Exp6-1), what conclusions can you make about how the regularization parameter affects your model?

I found that the larger the λ\lambda, the smoother the curve of the polynomial, which means the polynomial is approximating a line.

# Question 2

How does λ\lambda affect the results?

Not only λ\lambda affect the shape of the curve, it also affect the topological structure of the curve. For example, there is a hole when λ=0\lambda = 0, and the hold disappear while λ=1\lambda = 1 or 1010.