采用牛顿法求解对数概率回归模型。

# Experiment 4: Logistic Regression and Newton's Method

This is the report of Experiment 4: Logistic Regression and Newton's Method.

# Purpose

In this experiment, we want to implement logistic regression on a classification problem.

The value of inputs {x(i)}\{x^{(i)}\} are each students' score on two standardized exams. The value of {y(i)}\{y^{(i)}\} is a label of whether the student was admitted.

# Hypothesis

We hypothesize that there exists kk that for each ii, the probability can be written as P(x1(i)+kx2(i))P(x^{(i)}_1 + kx^{(i)}_2).

# Procedure

The predict question is a classification problem.With the hypothesis, we want to divide all data points into two groups by a line, the positive group is the students were admitted, the negative group is the students were not admitted. Now, I'll find the line with Newton's Method.

We predict the possibility with the hypothesis function

hθ(x):=g(θx)=11+eθx=P(y=1x;θ)h_\theta(x):=g(\theta^\top x) = \frac{1}{1+e^{-\theta^\top x}} = \mathbb{P}(y=1\|x;\theta)

Through maximum likelihood estimation, the logarithm of likelihood function J(θ)J(\theta) is

J(θ):=1mi=1m(y(i)log(hθ(x(i)))+(1y(i))log(1hθ(x(i))))J(\theta) := -\frac{1}{m}\sum_{i=1}^m\left(y^{(i)} \log\left(h_{\theta}(x^{(i)})\right) + (1-y^{(i)})\log\left(1-h_{\theta}(x^{(i)})\right)\right)

We want to minimize the function with Newton's method, whose update rule is

θ(t+1)=θ(t)H1θJ\theta^{(t+1)} = \theta^{(t)} - H^{-1}\nabla_\theta J

The gradient θJ\nabla_\theta J is

1mi=1m(hθ(x(i))y(i))x(i)\frac{1}{m}\sum_{i=1}^m\left(h_{\theta}(x^{(i)})-y^{(i)}\right)x^{(i)}

The inverse of Hessian H1H^{-1} is

H1=1mi=1mx(i)x(i)(hθ(x(i)))(1hθ(x(i))H^{-1} = \frac{1}{m}\sum_{i=1}^mx^{(i)}{x^{(i)}}^\top\left(h_\theta(x^{(i)})\right)\left(1-h_\theta(x^{(i)}\right)

So we can calculate the θ^\hat\theta with Newton's method. After convergence, we need to draw the decision boundary of this classification problem. The line satisfies that

P(y=1x;θ)=hθ(x)=0.5\mathbb{P}(y=1|x;\theta) = h_{\theta}(x) = 0.5

which means

θx=0\theta^\top x = 0

# Answer of the questions

# Question 1

After 55 iterations, I got the θ\theta is

θ=(16.378740,0.148341,0.158908)\theta = (-16.378740, 0.148341, 0.158908)^\top

result of 1500 iterations

# Question 2

The probability that a student with a score of 2020 on Exam 1 and a score of 8080 on Exam 2 is 0.3319780.331978.