运筹学项目的组成
Formulation Mathematical Modeling Data Collection Validation/Analysis Interpretation/Implementation 运筹学课程大致包括以下内容:
数学优化 (Mathematical Optimization)线性规划 (Linear Programming, LP) 整数规划 (Integer Linear Programming) 非线性规划 (Non-Linear Programming)  动态规划 (Dynamic Programming) 图 (Graphs) 调度 (Scheduling) #  线性规划#  线性规划的概念#  线性规划的基本形式线性规划  (Linear Programming, LP ) 是一类最基本的优化问题,其存在一个最优化的目标函数 f ( x ) f(\boldsymbol{x}) f ( x )   和一些约束式。如果将约束式和最优化目标的系数进行调整,就可以使目标函数的优化方向为最小化 ,同时保证各线性不等式均为 L H S ≥ R H S \mathrm{LHS} \geq \mathrm{RHS} L H S ≥ R H S  , 且约束变量恒非负,我们称此种形式的线性规划问题为规范形式的线性规划问题  (normal form of LP), 其包括一个最小化目标函数 f ( x ) f(\boldsymbol{x}) f ( x )   及多个线性不等式作为限制条件,可以表示为:
min  f ( x ) = ∑ i = 1 n c i x i = c ⊤ x , c , x ∈ R n s . t . A x ≥ b , A ∈ R m × n , b ∈ R m x i ≥ 0 , ∀ i = 1 , 2 , … , n \begin{aligned} & \min && f(\boldsymbol{x}) = \sum_{i=1}^nc_ix_i = \boldsymbol{c}^\top\boldsymbol{x}, && \boldsymbol{c}, \boldsymbol{x} \in \mathbb{R}^n \\ & s.t. && A\boldsymbol{x} \geq \boldsymbol{b}, && A \in \mathbb{R}^{m\times n}, \boldsymbol{b} \in \mathbb{R}^m\\ & && x_i \geq 0, && \forall i=1,2,\dots,n \end{aligned}  min s . t .   f ( x ) = i = 1 ∑ n  c i  x i  = c ⊤ x , A x ≥ b , x i  ≥ 0 ,   c , x ∈ R n A ∈ R m × n , b ∈ R m ∀ i = 1 , 2 , … , n  
如果此时我们引入松弛变量  (slack variable)
x j = b k − ∑ i = 1 n a k , i x i , j = n + 1 , … , n + m x_j = b_k-\sum_{i=1}^na_{k,i}x_i, \;j=n+1,\dots,n+m x j  = b k  − i = 1 ∑ n  a k , i  x i  , j = n + 1 , … , n + m 
那么 A x ≥ b Ax \geq b A x ≥ b   可以转化为 ( A I m ) x ′ = b (A\;I_m)\boldsymbol{x}' = \boldsymbol{b} ( A I m  ) x ′ = b  , 其中 x ′ = ( x ⊤ , x n + 1 , … , x n + m ) ⊤ \boldsymbol{x}' = (\boldsymbol{x}^\top,x_{n+1}, \dots, x_{n+m})^\top x ′ = ( x ⊤ , x n + 1  , … , x n + m  ) ⊤  . 我们将作这种转化后的线性规划问题称为标准形式的线性规划问题  (standard form of LP), 记作
min  c ⊤ x s . t . A x = b x ≥ 0 \begin{aligned} & \min && \boldsymbol{c}^\top \boldsymbol{x} \\ & s.t. && A\boldsymbol{x} = \boldsymbol{b}\\ & && \boldsymbol{x} \geq \boldsymbol{0} \end{aligned}  min s . t .   c ⊤ x A x = b x ≥ 0  
注意到,线性方程组 A x = b A\boldsymbol{x} = \boldsymbol{b} A x = b   有 n n n   个未知量。一般认为 A x = b A\boldsymbol{x} = \boldsymbol{b} A x = b   中各方程是相互独立的,因此存在关系 m = r a n k A ≤ n m = \mathrm{rank}{A} \leq n m = r a n k A ≤ n  . 一般情况下,默认存在前述限制条件A ∈ R m × n , b ∈ R m , c , x ∈ R n A \in \mathbb{R}^{m\times n}, \boldsymbol{b} \in \mathbb{R}^m, \boldsymbol{c, x} \in \mathbb{R}^n A ∈ R m × n , b ∈ R m , c , x ∈ R n  .
#  线性规划的解对标准形式的线性规划,设矩阵 A A A   的一个 m m m   阶满秩方阵为 B B B  , 则 B B B   是 A A A   的一个基  (basis). B B B   中 m m m   个线性无关的列向量称为基向量  (basis vectors). x \boldsymbol{x} x   中对应的 m m m   个分量称为基变量  (basic variables), 设其组成 x B ∈ R m \boldsymbol{x}_B \in \mathbb{R}^m x B  ∈ R m  . 其余 n − m n-m n − m   个称为非基变量  (nonbasic variables). 非基变量都为0 0 0   的解称为 x \boldsymbol{x} x   的基本解  (basic solution). 如果解中存在基变量为 0 0 0  , 那么称这是一个退化的基本解  (degenerate basic solution).
我们可以交换 A A A   的列向量,而对原 LP 问题的解没有影响 (仅交换顺序). 因此,我们在进行形式推导时,往往假设 x \boldsymbol{x} x   中前 m m m   个为其基变量,其余为其非基变量。这样 x = ( x B ⊤ , 0 ⊤ ) ⊤ \boldsymbol{x} = (\boldsymbol{x}_B^\top, \boldsymbol{0}^\top)^\top x = ( x B ⊤  , 0 ⊤ ) ⊤  .
#  线性规划的求解线性规划基本定理  (Fundamental Theorem of LP): 若线性规划问题具有可行解,那么其具有基本可行解。若线性规划问题具有最优可行解,那么其具有最优基本可行解。
证明  假设 x = ( x 1 , x 2 , … , x n ) ⊤ \boldsymbol{x} = (x_1,x_2,\dots, x_n)^\top x = ( x 1  , x 2  , … , x n  ) ⊤   是一个可行解,不妨设其为 x = ( x 1 , x 2 , … , x p , 0 , … , 0 ) ⊤ \boldsymbol{x} = (x_1,x_2,\dots, x_p, 0, \dots, 0)^\top x = ( x 1  , x 2  , … , x p  , 0 , … , 0 ) ⊤  , 其中 ∀ 1 ≤ i ≤ p \forall 1 \leq i \leq p ∀ 1 ≤ i ≤ p  , x i > 0 x_i>0 x i  > 0  . 设 A = ( a 1 , a 2 , … , a m ) A = (\boldsymbol{a}_1, \boldsymbol{a}_2, \dots, \boldsymbol{a}_m) A = ( a 1  , a 2  , … , a m  ) 
w.r.t. : with respect to, 关于
#  单纯形法 (Simplex)#  单纯形法的本质单纯形法的几何意义是寻找多面体顶点,验证其是否为最优解。
#  单纯形法的过程单纯形法算法如下进行:
Find an initial basis feasible solution x x x   w.r.t. B B B  . Transform [ A b ] [A \; b] [ A b ]   to [ I m Y γ ] [I_m \; Y \; \gamma] [ I m  Y γ ]  . Compute γ i = c i − z i \gamma_i = c_i-z_i γ i  = c i  − z i   , ∀ i ≥ m + 1 \forall i \geq m+1 ∀ i ≥ m + 1  . z i = ∑ l = 1 m c l y l , i z_i = \sum_{l=1}^mc_ly_{l,i} z i  = ∑ l = 1 m  c l  y l , i   . If γ i ≥ 0 \gamma_i \geq 0 γ i  ≥ 0  , then x x x   is optimal. Stop. Otherwise, pick q q q   with γ q < 0 \gamma_q < 0 γ q  < 0  . (Always pick the minimum). If no y i , q > 0 y_{i,q}>0 y i , q  > 0   for all 1 ≤ i ≤ m 1\leq i\leq m 1 ≤ i ≤ m  , then stop and report "unbounded", else p = arg min  { y i , 0 y i , q ∣ y i , q > 0 } \displaystyle p = \argmin \left\{\left.\frac{y_{i,0}}{y_{i,q}}\right|y_{i,q}>0\right\} p = a r g m i n { y i , q  y i , 0   ∣ ∣ ∣ ∣ ∣  y i , q  > 0 }  B ′ = { a q } ∪ B ∖ { a p } B' = \{a_q\}\cup B \setminus \{a_p\} B ′ = { a q  } ∪ B ∖ { a p  }   and update [ Y , y 0 ] [Y, y_0] [ Y , y 0  ]  .Go to Step 3. Why is B ′ B' B ′   a basis?
#  单纯形法举例min  Z = − x 1 − 2 x 2 − x 3 s . t . 2 x 1 + x 2 − x 3 ≤ 2 2 x 1 − x 2 + 5 x 3 ≤ 6 4 x 1 + x 2 + x 3 ≤ 6 x 1 , x 2 , x 3 ≥ 0 \begin{aligned} & \min &&Z = -x_1 - 2x_2 - x_3\\ & s.t. && 2x_1 + x_2 - x_3 \leq 2\\ & && 2x_1 - x_2 + 5x_3 \leq 6\\ & && 4x_1 + x_2 + x_3 \leq 6\\ & && x_1,x_2,x_3 \geq 0 \end{aligned}  min s . t .   Z = − x 1  − 2 x 2  − x 3  2 x 1  + x 2  − x 3  ≤ 2 2 x 1  − x 2  + 5 x 3  ≤ 6 4 x 1  + x 2  + x 3  ≤ 6 x 1  , x 2  , x 3  ≥ 0  
过程略。
#  两阶段法 (Two-Phase Method)单纯形法的执行过程包括两个主要部分:寻找初始解 和最优解迭代 。对于约束为 A x ≤ b A\boldsymbol{x} \leq \boldsymbol{b} A x ≤ b   类型的问题,我们可以通过松弛变量找到一个相当弱的初始解 (一般为 x i = 0 , ∀ i x_i = 0, \forall i x i  = 0 , ∀ i  ). 但对于一些难以求得初始解的 LP 问题,我们需要借助两阶段法 寻找一个合适的初始解。
两阶段法  (two-phase method) 实际上是单纯形法中求初始解的优化方法,其算法可以分为两个部分:
约束条件不变,求解优化目标为 ∑ x a r t \sum x_{art} ∑ x a r t    的 LP 问题,其中 a a r t a_{art} a a r t    为辅助变量  (artificial variables). 从上一阶段得到的基本可行解出发,找到最优的解。 注意到,寻找阶段 1 的初始解是容易的,而阶段 1 的最终解是原问题的一个基本可行解。因此可以简化单纯形法中初始解的寻找问题。
两阶段法的具体做法如下:对约束问题
min  c ⊤ x s . t . A x = b x ≥ 0 \begin{aligned} & \min && \boldsymbol{c}^\top \boldsymbol{x} \\ & s.t. && A\boldsymbol{x} = \boldsymbol{b}\\ & && \boldsymbol{x} \geq \boldsymbol{0} \end{aligned}  min s . t .   c ⊤ x A x = b x ≥ 0  
构造辅助变量 x n + 1 , x n + 2 , … , x n + m x_{n+1},x_{n+2},\dots,x_{n+m} x n + 1  , x n + 2  , … , x n + m   , 求解
min  g = ∑ i = n + 1 n + m x i s . t . A x + ( x n + 1 , x n + 2 , … , x n + m ) ⊤ = b x ≥ 0 , x a r t > 0 \begin{aligned} & \min && g = \sum_{i=n+1}^{n+m} x_i \\ & s.t. && A\boldsymbol{x} + (x_{n+1},x_{n+2},\dots,x_{n+m})^\top = \boldsymbol{b}\\ & && \boldsymbol{x} \geq \boldsymbol{0},\;x_{art}>0 \end{aligned}  min s . t .   g = i = n + 1 ∑ n + m  x i  A x + ( x n + 1  , x n + 2  , … , x n + m  ) ⊤ = b x ≥ 0 , x a r t  > 0  
这样,若最小值 g = ∑ x a r t = 0 g = \sum x_{art} = 0 g = ∑ x a r t  = 0  , 那么对应的 x \boldsymbol{x} x   就是一个初始解。
#  单纯形表Assumption: The first m m m   columns of A A A   form a basis B B B  . A = [ B , D ] , C = [ C B ⊤ , C D ⊤ ] ⊤ A = [B, D], C = [C_B^\top, C_D^\top]^\top A = [ B , D ] , C = [ C B ⊤  , C D ⊤  ] ⊤  , x = [ x B ⊤ , x D ⊤ ] ⊤ x=[x_B^\top,x_D^\top]^\top x = [ x B ⊤  , x D ⊤  ] ⊤  . Then min  c B ⊤ x B + c D ⊤ x D \min c_B^\top x_B + c_D^\top x_D min c B ⊤  x B  + c D ⊤  x D   , s.t. B x B + D x D = b Bx_B + Dx_D = b B x B  + D x D  = b  , and x B , x D ≥ 0 x_B, x_D \geq 0 x B  , x D  ≥ 0  .
For the basic solution: x = [ x B ⊤ , 0 ⊤ ] x=[x_B^\top, 0^\top] x = [ x B ⊤  , 0 ⊤ ]  , x B = B − 1 b x_B=B^{-1}b x B  = B − 1 b  , x = [ B − 1 b , 0 ] ⊤ x=[B^{-1}b, 0]^\top x = [ B − 1 b , 0 ] ⊤  . Z D = c B ⊤ B − 1 b Z_D = c_B^\top B^{-1}b Z D  = c B ⊤  B − 1 b  . Tableau
For a ( m + 1 ) × ( m + 1 ) (m+1)\times(m+1) ( m + 1 ) × ( m + 1 )   matrix
[ A b c ⊤ 0 ] = [ B D b c B ⊤ c D ⊤ 0 ] \left[\begin{matrix} A & b \\ c^\top & 0 \end{matrix}\right] = \left[\begin{matrix} B & D & b \\ c_B^\top & c_D^\top & 0 \end{matrix}\right] [ A c ⊤  b 0  ] = [ B c B ⊤   D c D ⊤   b 0  ] 
Exists a ( m + 1 ) × ( m + 1 ) (m+1)\times(m+1) ( m + 1 ) × ( m + 1 )   matrix
[ B − 1 0 0 ⊤ 1 ] × [ B D b c B ⊤ c D ⊤ 0 ] = [ I m B − 1 D = Y B − 1 b = y 0 c B ⊤ c D ⊤ 0 ] \left[\begin{matrix} B^{-1} & \boldsymbol{0} \\ \boldsymbol{0}^\top & \boldsymbol{1} \end{matrix}\right] \times \left[\begin{matrix} B & D & b \\ c_B^\top & c_D^\top & \boldsymbol{0} \end{matrix}\right] = \left[\begin{matrix} I_m & B^{-1}D=Y & B^{-1}b=\boldsymbol{y}_0\\ c_B^\top & c_D^\top & 0 \end{matrix}\right] [ B − 1 0 ⊤  0 1  ] × [ B c B ⊤   D c D ⊤   b 0  ] = [ I m  c B ⊤   B − 1 D = Y c D ⊤   B − 1 b = y 0  0  ] 
继续变换,得到
[ I m 0 − c B ⊤ 1 ] × [ I m B − 1 D B − 1 b c B ⊤ c D ⊤ 0 ] = [ I m B − 1 D B − 1 b 0 c D ⊤ − c B ⊤ B − 1 D − c B ⊤ B − 1 b ] ( ∗ ) \left[\begin{matrix} I_m & \boldsymbol{0} \\ -c_B^\top & \boldsymbol{1} \end{matrix}\right] \times \left[\begin{matrix} I_m & B^{-1}D & B^{-1}b \\ c_B^\top & c_D^\top & 0 \end{matrix}\right] = \left[\begin{matrix} I_m & B^{-1}D & B^{-1}b\\ \boldsymbol{0} & c_D^\top-c_B^\top B^{-1}D & -c_B^\top B^{-1}b \end{matrix}\right](*) [ I m  − c B ⊤   0 1  ] × [ I m  c B ⊤   B − 1 D c D ⊤   B − 1 b 0  ] = [ I m  0  B − 1 D c D ⊤  − c B ⊤  B − 1 D  B − 1 b − c B ⊤  B − 1 b  ] ( ∗ ) 
其中,( ∗ ) (*) ( ∗ )   的右下角项− c B ⊤ B − 1 b = − Z 0 -c_B^\top B^{-1}b = -Z_0 − c B ⊤  B − 1 b = − Z 0    是目标函数值。且c D ⊤ − c B ⊤ B − 1 D = r D ⊤ c_D^\top - c_B^\top B^{-1}D = r_D^\top c D ⊤  − c B ⊤  B − 1 D = r D ⊤   .
因此这样可以顺便算出r D r_D r D    和− Z 0 -Z_0 − Z 0   , 但要求是( ∗ ) (*) ( ∗ )   矩阵的左下块为0 \boldsymbol{0} 0  .
给了一个需要自行证明的结论:
∀ j ≥ m + 1 \forall j \geq m+1 ∀ j ≥ m + 1  :
[ Y y 0 c D ⊤ 0 ] → [ Y y 0 ′ c D ⊤ Z ′ ] \left[\begin{matrix} Y & \boldsymbol{y}_0 \\ c_D^\top & \boldsymbol{0} \end{matrix}\right] \to \left[\begin{matrix} Y & \boldsymbol{y}_0'\\ c_D^\top & Z' \end{matrix}\right] [ Y c D ⊤   y 0  0  ] → [ Y c D ⊤   y 0 ′  Z ′  ] 
只需要∀ 1 ≤ i ≤ m + 1 : y i j ′ = y i j − y p j y p q y i q , i ≠ p \forall 1 \leq i \leq m+1:\quad y_{ij}' = y_{ij}-\frac{y_{pj}}{y_{pq}}y_{iq}, i\neq p ∀ 1 ≤ i ≤ m + 1 : y i j ′  = y i j  − y p q  y p j   y i q  , i  = p  , y i j ′ = y p i y p q , i = p y_{ij}'=\frac{y_{pi}}{y_{pq}}, i=p y i j ′  = y p q  y p i   , i = p  .
Example :
max  7 x 1 + 6 x 2 , s . t . 2 x 1 + x 2 ≤ 3 x 1 + 4 x 2 ≤ 4 x 1 , x 2 ≥ 0 \begin{aligned} & \max && 7x_1 + 6x_2, \\ & s.t. && 2x_1 + x_2 \leq 3 \\ & && x_1 + 4x_2 \leq 4 \\ & && x_1, x_2 \geq 0 \end{aligned}  max s . t .   7 x 1  + 6 x 2  , 2 x 1  + x 2  ≤ 3 x 1  + 4 x 2  ≤ 4 x 1  , x 2  ≥ 0  
标准化,得到
min  − 7 x 1 − 6 x 2 s . t . 2 x 1 + x 2 + x 3 = 3 x 1 + 4 x 2 + x 4 = 4 x 1 , x 2 , x 3 , x 4 ≥ 0 \begin{aligned} &\min && -7x_1-6x_2\\ &s.t. && 2x_1 + x_2 + x_3 = 3 \\ &&& x_1 + 4x_2 + x_4 = 4 \\ &&& x_1, x_2, x_3, x_4\geq 0 \end{aligned}  min s . t .   − 7 x 1  − 6 x 2  2 x 1  + x 2  + x 3  = 3 x 1  + 4 x 2  + x 4  = 4 x 1  , x 2  , x 3  , x 4  ≥ 0  
Then,
A = [ 2 1 1 0 1 4 0 1 ] A= \left[\begin{matrix} 2 & 1 & 1 & 0\\ 1 & 4 & 0 & 1\\ \end{matrix}\right] A = [ 2 1  1 4  1 0  0 1  ] 
So,
[ A b c ⊤ 0 ] = [ 2 1 1 0 3 1 4 0 1 3 − 7 − 6 0 0 0 ] \left[\begin{matrix} A & b \\ c^\top & \boldsymbol{0} \end{matrix}\right] = \left[\begin{matrix} 2 & 1 & 1 & 0 & 3 \\ 1 & 4 & 0 & 1 & 3 \\ -7 & -6 & 0 & 0 & 0 \\ \end{matrix}\right] [ A c ⊤  b 0  ] = ⎣ ⎢ ⎡  2 1 − 7  1 4 − 6  1 0 0  0 1 0  3 3 0  ⎦ ⎥ ⎤  
and x = [ x 1 , x 2 , x 3 , x 4 ] ⊤ = [ 0 , 0 , 3 , 4 ] ⊤ x = [x_1,x_2,x_3,x_4]^\top = [0,0,3,4]^\top x = [ x 1  , x 2  , x 3  , x 4  ] ⊤ = [ 0 , 0 , 3 , 4 ] ⊤  .
So r 1 = − 7 , r 2 = − 6 ⇒ q = 1 r_1 = -7, r_2 = -6 \Rightarrow q = 1 r 1  = − 7 , r 2  = − 6 ⇒ q = 1  .
[ 1 1 2 1 2 0 3 2 0 7 2 − 1 2 1 5 2 0 − 5 2 7 2 0 21 2 ] \left[\begin{matrix} 1 & \frac{1}{2} & \frac{1}{2} & 0 & \frac{3}{2}\\ 0 & \frac{7}{2} & -\frac{1}{2} & 1 & \frac{5}{2}\\ 0 & -\frac{5}{2} & \frac{7}{2} & 0 & \frac{21}{2} \end{matrix}\right] ⎣ ⎢ ⎡  1 0 0  2 1  2 7  − 2 5   2 1  − 2 1  2 7   0 1 0  2 3  2 5  2 2 1   ⎦ ⎥ ⎤  
Simplex 的反例 (会走入死循环,cycling)
min  − 3 4 x 1 + 20 x 2 − 1 2 x 3 + 6 x 4 s . t . 1 4 x 1 − 8 x 2 − x 3 + 9 x 4 + x 5 = 0 1 2 x 1 − 12 x 2 − 1 2 x 3 + 3 x 4 + x 6 = 0 x 3 + x 7 = 1 x 1 , … , x 7 ≥ 0 \begin{aligned} & \min && -\frac{3}{4}x_1 + 20x_2 - \frac{1}{2}x_3 + 6x_4 \\ & s.t. && \frac{1}{4}x_1-8x_2-x_3+9x_4+x_5 = 0 \\ & && \frac{1}{2}x_1-12x_2-\frac{1}{2}x_3+3x_4+x_6=0 \\ & && x_3+x_7 = 1\\ & && x_1, \dots, x_7 \geq 0 \end{aligned}  min s . t .   − 4 3  x 1  + 2 0 x 2  − 2 1  x 3  + 6 x 4  4 1  x 1  − 8 x 2  − x 3  + 9 x 4  + x 5  = 0 2 1  x 1  − 1 2 x 2  − 2 1  x 3  + 3 x 4  + x 6  = 0 x 3  + x 7  = 1 x 1  , … , x 7  ≥ 0  
#  Improvement of Simplex Method[ A , b ] ⇝ [ I m , Y , y 0 ] ⇝ [ I m , Y ′ , y 0 ] ⇝ ⋯ [A,b]\leadsto[I_m, Y, y_0]\leadsto[I_m, Y', y_0]\leadsto \cdots [ A , b ] ⇝ [ I m  , Y , y 0  ] ⇝ [ I m  , Y ′ , y 0  ] ⇝ ⋯ 
若 n ≫ m n\gg m n ≫ m  , 绝大多数列没有机会进入基。
#  过程总结Form a revised tableau [ B − 1 , y 0 = B − 1 b ] [B^{-1}, y_0=B^{-1}b] [ B − 1 , y 0  = B − 1 b ]  . Calculate r D ⊤ = c D ⊤ − λ ⊤ D , λ ⊤ = c B ⊤ B − 1 r_D^\top = c_D^\top - \lambda^\top D, \lambda^\top =c_B^\top B^{-1} r D ⊤  = c D ⊤  − λ ⊤ D , λ ⊤ = c B ⊤  B − 1  . If ∀ j , r j ≥ 0 \forall j, r_j \geq 0 ∀ j , r j  ≥ 0  , then output x x x  . Select q q q   with r q < 0 r_q < 0 r q  < 0  . Compute y q = B − 1 a q y_q = B^{-1}a_q y q  = B − 1 a q   . If no y i q > 0 y_{iq} > 0 y i q  > 0  , then stop. (unbounded ) p = arg min  { y i 0 y i q : y i q > 0 } p = \argmin\{\frac{y_{i0}}{y_{iq}}:y_{iq}>0\} p = a r g m i n { y i q  y i 0   : y i q  > 0 }  . B max  − 1 = E × B − 1 B^{-1}_{\max} = E \times B^{-1} B m a x − 1  = E × B − 1  .每步的计算量减小。
Example:
{ min  3 x 1 + 5 x 2 s . t . x 1 + x 2 ≤ 4 5 x 1 + 3 x 2 ≥ 8 x 1 , x 2 ≥ 0 \left\{\begin{aligned} & \min & & 3x_1+5x_2 \\ & s.t. & & x_1+x_2\leq 4\\ & & & 5x_1+3x_2\geq 8\\ & & & x_1,x_2 \geq 0 \end{aligned}\right. ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎧   min s . t .   3 x 1  + 5 x 2  x 1  + x 2  ≤ 4 5 x 1  + 3 x 2  ≥ 8 x 1  , x 2  ≥ 0  
#  线性规划的对偶问题对偶性  (duality) 在近似算法、LP 等多种方向都有所应用。
#  对偶问题的概念对一个规范线性规划
min  c ⊤ x s . t . A x ≥ b x ≥ 0 \begin{aligned} & \min && \boldsymbol{c}^\top \boldsymbol{x} \\ & s.t. && A \boldsymbol{x} \geq \boldsymbol{b}\\ & && \boldsymbol{x} \geq \boldsymbol{0} \end{aligned}  min s . t .   c ⊤ x A x ≥ b x ≥ 0  
其对偶问题定义为
max  b ⊤ y s . t . y ⊤ A ≤ c y ≥ 0 \begin{aligned} & \max && \boldsymbol{b}^\top \boldsymbol{y} \\ & s.t. && \boldsymbol{y}^\top A \leq \boldsymbol{c}\\ & && \boldsymbol{y} \geq \boldsymbol{0} \end{aligned}  max s . t .   b ⊤ y y ⊤ A ≤ c y ≥ 0  
这样可以推得,对问题
min  c ⊤ x s . t . A x = b x ≥ 0 \begin{aligned} & \min && \boldsymbol{c}^\top \boldsymbol{x} \\ & s.t. && A \boldsymbol{x} = \boldsymbol{b} \\ & && \boldsymbol{x} \geq \boldsymbol{0} \end{aligned}  min s . t .   c ⊤ x A x = b x ≥ 0  
其对偶问题为
max  b ⊤ y s . t . y ⊤ A ≤ c \begin{aligned} & \max && \boldsymbol{b}^\top \boldsymbol{y} \\ & s.t. && \boldsymbol{y}^\top A \leq \boldsymbol{c} \end{aligned}  max s . t .   b ⊤ y y ⊤ A ≤ c  
#  对偶问题的性质Claim : The dual of dual is primal. (对偶的对偶是原问题)
Lemma : Suppose x x x   and y y y   are feasible solution to the primal and dual LPs respectively. Then, c ⊤ x ≥ y ⊤ b c^\top x \geq y^\top b c ⊤ x ≥ y ⊤ b  . 即互相对偶的问题中,min 那项对应的值更大。
Lemma : If one is unbounded, then the other has no feasible solution.
Theorem : Suppose x 0 , y 0 x_0,y_0 x 0  , y 0    feasible solutions to primal, dual rep. If c ⊤ x 0 = b ⊤ y 0 c^\top x_0 = b^\top y_0 c ⊤ x 0  = b ⊤ y 0   , then x 0 , y 0 x_0, y_0 x 0  , y 0    are optimal. 碰上了一定是最优的。
Proof : 略
#  线性规划的其它快速求解方法#  椭圆球算法 (Ellipsoid)by Khachiyan
Def of ellipsoid:
E = E ( A , a ) = { x ∈ R n ∣ ( x − a ) ⊤ A − 1 ( x − a ) ≤ 1 } E = E(A,a) = \{x \in \mathbb{R}^n|(x-a)^\top A^{-1} (x-a) \leq 1\} E = E ( A , a ) = { x ∈ R n ∣ ( x − a ) ⊤ A − 1 ( x − a ) ≤ 1 }  . 其宽度 (width) 为 α \sqrt\alpha α   , 半径 (radius) 为 β \sqrt{\beta} β   . 其中 α , β \alpha, \beta α , β   分别是 A A A   最小和最大的特征值。
对一个 LP 问题:
{ min  c ⊤ x s . t . A x ≥ b \left\{\begin{aligned} & \min && c^\top x\\ & s.t. && Ax \geq b\\ \end{aligned}\right. {  min s . t .   c ⊤ x A x ≥ b  
#  Karmarkar algorithm【线性规划 (七)】内点法 - 王源的文章 - 知乎 
Def : Canonical from of LP
min  c ⊤ x s . t . A x = 0 ∑ x i = 1 x i ≥ 0 \begin{aligned} & \min & c^\top x &\\ & s.t. & Ax = 0&\\ & & \sum_{x_i} = 1& \\ && x_i \geq 0\\ \end{aligned}  min s . t .  c ⊤ x A x = 0 x i  ∑  = 1 x i  ≥ 0   
Def: Ω = { x : A x = 0 } , Δ = { e ⊤ x = 1 } \Omega=\{x:Ax = 0\}, \Delta = \{e^\top x = 1\} Ω = { x : A x = 0 } , Δ = { e ⊤ x = 1 }  .
Nullspace of a matrix M M M  : { x ∣ M x = 0 } \{x|Mx = 0\} { x ∣ M x = 0 }  . center of Δ \Delta Δ  : α 0 = e n = [ 1 / n , 1 / n , … , 1 / n ] ⊤ \alpha_0 = \frac{e}{n} = [1/n,1/n,\dots,1/n]^\top α 0  = n e  = [ 1 / n , 1 / n , … , 1 / n ] ⊤  .
Ω ∩ Δ = { x : ( A e ⊤ ) x = ( ) } \Omega \cap \Delta = \left\{x:\binom{A}{e^\top}x = \binom{}{} \right\} Ω ∩ Δ = { x : ( e ⊤ A  ) x = (  ) } 
Assumptions:
α 0 ∈ Ω \alpha_0 \in \Omega α 0  ∈ Ω Opt. value of c ⊤ x = 0 c^\top x = 0 c ⊤ x = 0  . r a n k [ A e ⊤ ] = m + 1 \displaystyle rank\left[\begin{matrix}A \\ e^\top \end{matrix}\right] = m+1 r a n k [ A e ⊤  ] = m + 1  .∃ q ∈ R + \exists q \in \mathbb{R}^+ ∃ q ∈ R +  , s.t. if c ⊤ x c ⊤ x 0 ≤ 2 − q \displaystyle \frac{c^\top x}{c^\top x_0} \leq 2^{-q} c ⊤ x 0  c ⊤ x  ≤ 2 − q  . then x x x   is optimal.(待整理)
#  整数线性规划#  概念整数线性规划  (Integer Linear Programming, ILP) 问题的形式如下:
min  c ⊤ x s . t . A x = b x ≥ 0 x ∈ Z n \begin{aligned} & \min && \boldsymbol{c}^\top \boldsymbol{x} \\ & s.t. && A\boldsymbol{x} = \boldsymbol{b}\\ & && \boldsymbol{x} \geq \boldsymbol{0}\\ & && \boldsymbol{x} \in \mathbb{Z}^n \end{aligned}  min s . t .   c ⊤ x A x = b x ≥ 0 x ∈ Z n  
首先举几个整数线性规划问题的例子。
#  旅行商问题 (TSP)#  Gomory 割平面法#  凸优化#  基本概念可行解空间 Ω \Omega Ω   是凸集 ,且优化目标函数 f f f   是一个凸函数 ,那么称为一个 凸优化  (convex optimization).
#  凸集S ⊂ R n S \subset \mathbb{R}^n S ⊂ R n   是凸  (convex) 的,当且仅当 ∀ a , b ∈ S , ∀ α ∈ ( 0 , 1 ) \forall a,b \in S, \forall \alpha \in (0,1) ∀ a , b ∈ S , ∀ α ∈ ( 0 , 1 )  , 有 α a + ( 1 − α ) b ∈ S \alpha a + (1-\alpha) b \in S α a + ( 1 − α ) b ∈ S  .
#  凸函数S ∈ R n , ∀ a , b ∈ S , ∀ α ∈ ( 0 , 1 ) S \in \mathbb{R}^n, \;\forall a,b \in S, \;\forall \alpha \in (0,1) S ∈ R n , ∀ a , b ∈ S , ∀ α ∈ ( 0 , 1 )  , 若
f ( α a + ( 1 − α b ) ) ≤ α f ( a ) + ( 1 − α ) f ( b ) f(\alpha a + (1-\alpha b)) \leq \alpha f(a) + (1-\alpha)f(b) f ( α a + ( 1 − α b ) ) ≤ α f ( a ) + ( 1 − α ) f ( b ) 
那么称函数 f f f   是凸函数  (convex function).
#  性质与例子若 f 1 , f 2 f_1, f_2 f 1  , f 2    是凸函数,那么
α f 1 \alpha f_1 α f 1    是凸函数,其中 α > 0 \alpha > 0 α > 0  .f 1 + f 2 f_1 + f_2 f 1  + f 2    是凸函数。#  例 1f ( x ) = ∥ x ∥ f(x) = \|x\| f ( x ) = ∥ x ∥   是凸函数。采用 Cauchy -Schwarz  不等式即可证明:
∥ x + y ∥ 2 = ∥ x ∥ 2 + 2 ⟨ x , y ⟩ + ∥ y ∥ 2 ≤ ∥ x ∥ 2 + 2 ∥ x ∥ ∥ y ∥ + ∥ y ∥ 2 = ( ∥ x ∥ + ∥ y ∥ ) 2 \|x+y\|^2 = \|x\|^2 + 2\langle x,y\rangle + \|y\|^2 \leq \|x\|^2 + 2\|x\|\,\|y\| + \|y\|^2 = (\|x\|+\|y\|)^2 ∥ x + y ∥ 2 = ∥ x ∥ 2 + 2 ⟨ x , y ⟩ + ∥ y ∥ 2 ≤ ∥ x ∥ 2 + 2 ∥ x ∥ ∥ y ∥ + ∥ y ∥ 2 = ( ∥ x ∥ + ∥ y ∥ ) 2 
#  例 2给定凸集 S ∈ R n S \in \mathbb{R}^n S ∈ R n  , 其上的凸函数 f : S → R f:S \to \mathbb{R} f : S → R   和常数 c ∈ R c \in \mathbb{R} c ∈ R  , 其水平集  (level set, 又称等值面 ) H S ( f , c ) = { x ∈ S : f ( x ) ≤ c } H_S(f,c) = \{x \in S:f(x) \leq c\} H S  ( f , c ) = { x ∈ S : f ( x ) ≤ c }   是凸集。
证明 
∀ x 1 , x 2 ∈ H S \forall x_1, x_2 \in H_S ∀ x 1  , x 2  ∈ H S   , ∀ α ∈ ( 0 , 1 ) \forall \alpha \in (0,1) ∀ α ∈ ( 0 , 1 )  , 根据 f f f   的凸性,f ( α x 1 + ( 1 − α ) x 2 ) ≤ α f ( x 1 ) + ( 1 − α ) f ( x 2 ) f(\alpha x_1 + (1-\alpha)x_2) \leq \alpha f(x_1) + (1-\alpha) f(x_2) f ( α x 1  + ( 1 − α ) x 2  ) ≤ α f ( x 1  ) + ( 1 − α ) f ( x 2  )  .
又 x 1 , x 2 ∈ H S x_1, x_2 \in H_S x 1  , x 2  ∈ H S   , 于是 α f ( x 1 ) + ( 1 − α ) f ( x 2 ) ≤ α c + ( 1 − α ) c = c \alpha f(x_1) + (1-\alpha) f(x_2) \leq \alpha c + (1-\alpha)c = c α f ( x 1  ) + ( 1 − α ) f ( x 2  ) ≤ α c + ( 1 − α ) c = c  .
故 α x 1 + ( 1 − α ) x 2 ∈ H S ( f , c ) \alpha x_1 + (1-\alpha)x_2 \in H_S(f,c) α x 1  + ( 1 − α ) x 2  ∈ H S  ( f , c )  .
#  例 3对 f : S → R f: S \to \mathbb{R} f : S → R  , 定义 f f f   的上图像  (epigraph) 为
e p i ( f ) = [ x c ] : x ∈ S , c ∈ R , f ( x ) ≤ c \mathrm{epi}(f) = \left[ \begin{aligned} x \\ c \end{aligned} \right]: x \in S, c \in \mathbb{R}, f(x) \leq c e p i ( f ) = [ x c  ] : x ∈ S , c ∈ R , f ( x ) ≤ c 
#  定理S ∈ R n S \in \mathbb{R}^n S ∈ R n   是一个非空开凸集。f : S → R f: S \to \mathbb{R} f : S → R   且 f ∈ C 1 f \in C^1 f ∈ C 1  . 那么有 f f f   是 S S S   上的凸函数 ⟺ ∀ x , y ∈ S \iff \forall x, y \in S ⟺ ∀ x , y ∈ S   有 f ( y ) ≥ f ( x ) + ∇ f ⊤ ( x ) ( y − x ) f(y) \geq f(x) + \nabla f^\top(x) (y-x) f ( y ) ≥ f ( x ) + ∇ f ⊤ ( x ) ( y − x )  .
Proof :
"⇒ \Rightarrow ⇒  ":
f ∈ C 1 f \in C^1 f ∈ C 1   is convex, then
∀ x , y ∈ S : f ( α y + ( 1 − α ) x ) ≤ α f ( y ) + ( 1 − α ) f ( x ) \forall x,y \in S: \; f(\alpha y + (1-\alpha)x) \leq \alpha f(y) + (1-\alpha)f(x) ∀ x , y ∈ S : f ( α y + ( 1 − α ) x ) ≤ α f ( y ) + ( 1 − α ) f ( x ) 
So
f ( x + α ( y − x ) ) − f ( x ) ≤ α ( f ( y ) − f ( x ) ) f\big(x + \alpha(y-x)\big) - f(x) \leq \alpha(f(y)-f(x)) f ( x + α ( y − x ) ) − f ( x ) ≤ α ( f ( y ) − f ( x ) ) 
By Taylor Theorem:
f ( x + α ( y − x ) ) = α ∇ f ⊤ ( y − x ) + o ( ∥ α ( y − x ) ∥ ) f\big(x + \alpha(y-x)\big) = \alpha \nabla f^\top(y-x) + o(\|\alpha(y-x)\|) f ( x + α ( y − x ) ) = α ∇ f ⊤ ( y − x ) + o ( ∥ α ( y − x ) ∥ ) 
Let α → 0 \alpha \to 0 α → 0  ,
lim  α → 0 o ( ∥ α ( y − x ) ∥ ) α = 0 \lim_{\alpha \to 0}\frac{o(\|\alpha (y-x)\|)}{\alpha} = 0 α → 0 lim  α o ( ∥ α ( y − x ) ∥ )  = 0 
but o ∥ α ( y − x ) ∥ > 0 o\|\alpha (y-x)\|>0 o ∥ α ( y − x ) ∥ > 0  , so
∇ f ⊤ ( x ) ( y − x ) ≤ f ( y ) − f ( x ) \nabla f^\top(x) (y-x) \leq f(y) - f(x) ∇ f ⊤ ( x ) ( y − x ) ≤ f ( y ) − f ( x ) 
"⇐ \Leftarrow ⇐  ":
Def z = α u + ( 1 − α ) v z = \alpha u + (1-\alpha)v z = α u + ( 1 − α ) v  , for u , v ∈ S , α ∈ ( 0 , 1 ) u,v \in S, \alpha\in(0,1) u , v ∈ S , α ∈ ( 0 , 1 )  .
So z ∈ S z \in S z ∈ S  , and
f ( u ) ≥ f ( z ) + ∇ f ⊤ ( z ) ( u − z ) f ( v ) ≥ f ( z ) + ∇ f ⊤ ( z ) ( v − z ) f(u) \geq f(z) + \nabla f^\top(z)(u-z)\\ f(v) \geq f(z) + \nabla f^\top(z)(v-z) f ( u ) ≥ f ( z ) + ∇ f ⊤ ( z ) ( u − z ) f ( v ) ≥ f ( z ) + ∇ f ⊤ ( z ) ( v − z ) 
Then
α f ( u ) + ( 1 − α ) f ( v ) ≥ f ( z ) + ∇ f ⊤ ( z ) ( α ( u − z ) + ( 1 − α ) ( v − z ) ) ≥ f ( z ) \alpha f(u) + (1-\alpha)f(v) \geq f(z) + \nabla f^\top(z)\big(\alpha(u-z) + (1-\alpha)(v-z)\big) \geq f(z) α f ( u ) + ( 1 − α ) f ( v ) ≥ f ( z ) + ∇ f ⊤ ( z ) ( α ( u − z ) + ( 1 − α ) ( v − z ) ) ≥ f ( z ) 
#  Thm:S ⊂ R n S \subset \mathbb{R}^n S ⊂ R n  , f : S → R f:S \to \mathbb{R} f : S → R   and f ∈ C 2 f \in C^2 f ∈ C 2  . Then f f f   is convex ⟺ \iff ⟺   ∀ x ∈ S \forall x \in S ∀ x ∈ S  , the Hessian  H ( x ) H(x) H ( x )   of f f f   at x x x   is positive semi-definite (半正定,x ⊤ H x ≥ 0 x^\top H x \geq 0 x ⊤ H x ≥ 0  ).
Proof :
"⇐ \Leftarrow ⇐  ":
By Taylor Thm, ∃ α ∈ ( 0 , 1 ) \exist \alpha \in (0,1) ∃ α ∈ ( 0 , 1 )  . f ( y ) = f ( x ) + ∇ f ⊤ ( x ) ( y − x ) + 1 2 ( y − x ) ⊤ H ( x + α ( y − x ) ) ( y − x ) f(y) = f(x) + \nabla f^\top(x)(y-x) + \frac{1}{2} (y-x)^\top H(x + \alpha(y-x)) (y-x) f ( y ) = f ( x ) + ∇ f ⊤ ( x ) ( y − x ) + 2 1  ( y − x ) ⊤ H ( x + α ( y − x ) ) ( y − x )  .
"⇒ \Rightarrow ⇒  ":
By contraposition.(逆否命题) ∃ x ∈ S , ∃ d ∈ R n \exist x \in S, \exist d \in \mathbb{R}^n ∃ x ∈ S , ∃ d ∈ R n   s.t. d ⊤ H ( x ) d < 0 d^\top H(x) d < 0 d ⊤ H ( x ) d < 0  .
S S S   is open, so ∃ t ∈ R \exist t \in \mathbb{R} ∃ t ∈ R   with y + t d ∈ S y + td \in S y + t d ∈ S  . Because H H H   is continuous, ∀ z = α x + ( 1 − α ) y \forall z = \alpha x + (1-\alpha) y ∀ z = α x + ( 1 − α ) y  , d ⊤ H ( z ) d < 0 d^\top H(z) d < 0 d ⊤ H ( z ) d < 0  .
f ( y ) = f ( x ) + ∇ f ⊤ ( x ) ( y − x ) + 1 2 ( y − x ) ⊤ H ( x + α ( y − z ) ) ( y − z ) = f ( x ) + 1 2 f ⊤ ( y − x ) + 1 2 d ⊤ H ( z ) d < 0 f(y) = f(x) + \nabla f^\top (x)(y-x) + \frac{1}{2} (y-x)^\top H(x + \alpha(y-z))(y-z) = f(x) + \frac{1}{2}f^\top(y-x) + \frac{1}{2}d^\top H(z) d < 0 f ( y ) = f ( x ) + ∇ f ⊤ ( x ) ( y − x ) + 2 1  ( y − x ) ⊤ H ( x + α ( y − z ) ) ( y − z ) = f ( x ) + 2 1  f ⊤ ( y − x ) + 2 1  d ⊤ H ( z ) d < 0  .
#  Examplef ( x ) = 1 2 x ⊤ A x + b ⊤ x + c f(x) = \frac{1}{2}x^\top A x + b^\top x + c f ( x ) = 2 1  x ⊤ A x + b ⊤ x + c  , A ∈ R n × n A \in \mathbb{R}^{n \times n} A ∈ R n × n   s.t. A = A ⊤ A = A^\top A = A ⊤  , x , b ∈ R n , c ∈ R x,b \in \mathbb{R}^n, c \in \mathbb{R} x , b ∈ R n , c ∈ R  .
Then, ∇ f ( x ) = A x + b \nabla f(x) = Ax + b ∇ f ( x ) = A x + b  . F ( x ) = A F(x) = A F ( x ) = A  .
So f f f   is convex ⟺ \iff ⟺   A A A   is positive semi-definite.
#  凸规划min  f ( x ) f convex s . t . x ∈ Ω Ω convex \begin{aligned} \min f(x) & \quad f\; \text{convex} \\ s.t. \;x \in \Omega &\quad \Omega \;\text{convex} \end{aligned} min f ( x ) s . t . x ∈ Ω  f convex Ω convex  
Then, x ⋆ x^\star x ⋆   is a global optimizer ⟺ \iff ⟺   x ⋆ x^\star x ⋆   is local optimizer.
Proof : "⇐ \Leftarrow ⇐  ":
x ⋆ x^\star x ⋆   is global ⇒ y ∈ Ω : f ( y ) < f ( x ⋆ ) \Rightarrow y \in \Omega: f(y) < f(x^\star) ⇒ y ∈ Ω : f ( y ) < f ( x ⋆ )  .
Prop:
f f f   is convex on Ω \Omega Ω  , then the set of all optimizers of f f f   is convex.
Prop2:
Q ∈ R Q \in \mathbb{R} Q ∈ R   is open, convex. f ∈ C 1 f \in C^1 f ∈ C 1  . If ∃ x ⋆ ∈ Q \exist x^\star \in Q ∃ x ⋆ ∈ Q   s.t. ∀ x ∈ Q \forall x \in Q ∀ x ∈ Q   with x ≠ x ⋆ x \neq x^\star x  = x ⋆  , we have ∇ f ⊤ ( x ) ( x − x ⋆ ) ≥ 0 \nabla f^\top(x)(x-x^\star) \geq 0 ∇ f ⊤ ( x ) ( x − x ⋆ ) ≥ 0  , then x ⋆ x^\star x ⋆   is a global optimizer.
Note: ∇ f ( x ⋆ ) = 0 ⇒ x ⋆ \nabla f(x^\star) = 0 \Rightarrow x^\star ∇ f ( x ⋆ ) = 0 ⇒ x ⋆   is optimizer.
所以怎么找到一个局部最优解捏?
#  一维的优化问题就是求最小值min  f ( x ) \min f(x) min f ( x )  .
#  黄金分割方法 (Golden Section)Assumption: f f f   is unimodal (单峰的), which means [ a , b ] [a,b] [ a , b ]   only one local minimizer.
求 min  f ( x ) \min f(x) min f ( x ) 
其循环次数可以如下计算 (停机条件为b k − a k < ε b_k-a_k < \varepsilon b k  − a k  < ε  ):
if b i − a i = ( 1 − ρ ) ( b i − 1 − a i − 1 ) b_i-a_i = (1-\rho) (b_{i-1}-a_{i-1}) b i  − a i  = ( 1 − ρ ) ( b i − 1  − a i − 1  )  , then ( b n − a n ) ( 1 − ρ ) n ≤ ε (b_n-a_n)(1-\rho)^n \leq \varepsilon ( b n  − a n  ) ( 1 − ρ ) n ≤ ε  .
#  Fibonacci Method是否存在比黄金分割法更节约计算的方式?这时考虑采用不同的ρ \rho ρ  . 要求
min  ∏ i = 1 N ( 1 − ρ i ) s . t . ρ k + 1 ( 1 − ρ k ) = 1 − 2 ρ k ∀ k ρ k ∈ ( 0 , 1 / 2 ) \begin{aligned} & \min && \prod_{i=1}^N(1-\rho_i)\\ & s.t. && \rho_{k+1}(1-\rho_k) = 1-2\rho_k & \forall k \\ & && \rho_k \in (0,1/2) \end{aligned}  min s . t .   i = 1 ∏ N  ( 1 − ρ i  ) ρ k + 1  ( 1 − ρ k  ) = 1 − 2 ρ k  ρ k  ∈ ( 0 , 1 / 2 )  ∀ k  
直接给出这个优化问题的答案。与斐波那契数列{ F n } \{F_n\} { F n  }   有关。ρ k = 1 − F N + 1 − k / F N + 2 − 2 \rho_k = 1-F_{N+1-k}/F_{N+2-2} ρ k  = 1 − F N + 1 − k  / F N + 2 − 2   . ρ 1 = 1 − F N / F N + 1 \rho_1 = 1-F_N/F_{N+1} ρ 1  = 1 − F N  / F N + 1   .
Then, min  ∏ i = 1 N ( 1 − ρ i ) = 1 F N + 1 > 1 / ε \displaystyle \min\prod_{i=1}^N(1-\rho_i) = \frac{1}{F_{N+1}} > 1/\varepsilon min i = 1 ∏ N  ( 1 − ρ i  ) = F N + 1  1  > 1 / ε  .
#  Newton's Method要求f ∈ C 2 f \in C^2 f ∈ C 2  .
采用牛顿法求根,其迭代式为
x i = x i − 1 − f ( x i − 1 ) f ′ ( x i − 1 ) x_i = x_{i-1} - \frac{f(x_{i-1})}{f'(x_{i-1})} x i  = x i − 1  − f ′ ( x i − 1  ) f ( x i − 1  )  
于是要求最小值,只需要求x ⋆ x^\star x ⋆   使得f ′ ( x ⋆ ) = 0 f'(x^\star) = 0 f ′ ( x ⋆ ) = 0  . 这样采用牛顿法即可。
#  无约束优化问题 (Unconstrained Optimization)#  无约束优化问题的最优性条件min  f ( x ) , x ∈ R , f : R n → R \min f(x), \; x \in \mathbb{R},\,f:\mathbb{R}^n \to \mathbb{R} min f ( x ) , x ∈ R , f : R n → R  .
Necessary Condition (必要条件):用来判断一个点不是最优解。 Sufficient Condition (充分条件):用来判断一个点是最优的。 在这里只介绍一个必要条件:
#  FONC (First Order Necessary Condition, 一阶必要条件)f ∈ C 1 f \in C^1 f ∈ C 1  . 若 x ⋆ x^\star x ⋆   是一个局部最小值,那么 f ′ ( x ⋆ ) = 0 f'(x^\star) = 0 f ′ ( x ⋆ ) = 0  . 即局部最优点必是驻点 。
Proof :
Claim: For any feasible direction d d d   at x ⋆ x^\star x ⋆  , we have d ⊤ ∇ f ( x ⋆ ) = 0 d^\top \nabla f(x^\star) = 0 d ⊤ ∇ f ( x ⋆ ) = 0  .
Def x ( α ) : = x ⋆ + α d x(\alpha) := x^\star + \alpha d x ( α ) : = x ⋆ + α d  , ϕ ( x ) = f ( x ( α ) ) \phi(x) = f(x(\alpha)) ϕ ( x ) = f ( x ( α ) )  .
Then use Taylor Theorem:
f ( x ⋆ + α d ) − f ( x ⋆ ) = ϕ ( α ) − ϕ ( 0 ) = ϕ ′ ( 0 ) α + o ( α ) = α d ⊤ f ( x ( α ) ) + o ( α ) f(x^\star + \alpha d) - f(x^\star) = \phi(\alpha) - \phi(0) = \phi'(0)\alpha + \mathcal o(\alpha) = \alpha d^\top f(x(\alpha)) + \mathcal o(\alpha) f ( x ⋆ + α d ) − f ( x ⋆ ) = ϕ ( α ) − ϕ ( 0 ) = ϕ ′ ( 0 ) α + o ( α ) = α d ⊤ f ( x ( α ) ) + o ( α ) 
因为 x ⋆ x^\star x ⋆   是局部最小值,所以 f ( x ⋆ + α d ) ≥ f ( x ⋆ ) f(x^\star + \alpha d) \geq f(x^\star) f ( x ⋆ + α d ) ≥ f ( x ⋆ )  , 即 d ⊤ ∇ f ( x ⋆ ) ≥ 0 d^\top \nabla f(x^\star) \geq 0 d ⊤ ∇ f ( x ⋆ ) ≥ 0  . 同理,取 d d d   为 − d -d − d  , 则 − d ⊤ ∇ f ( x ⋆ ) ≥ 0 -d^\top \nabla f(x^\star) \geq 0 − d ⊤ ∇ f ( x ⋆ ) ≥ 0  . 所以 d ⊤ ∇ f ( x ⋆ ) = 0 d^\top \nabla f(x^\star) = 0 d ⊤ ∇ f ( x ⋆ ) = 0  , ∀ d \forall d ∀ d  . 故 ∇ f ( x ⋆ ) = 0 \nabla f(x^\star) = 0 ∇ f ( x ⋆ ) = 0  .
#  SONC (Second Order Necessary Condition, 二阶必要条件)不讲了,直接讲充分条件。
#  SOSC (Second Order Sufficient Condition, 二阶充分条件)f ∈ C 2 f \in C^2 f ∈ C 2  . 若 ∇ f ( x ⋆ ) = 0 \nabla f(x^\star) = 0 ∇ f ( x ⋆ ) = 0   且 F ( x ⋆ ) > 0 F(x^\star) > 0 F ( x ⋆ ) > 0  , 那么 x ⋆ x^\star x ⋆   是严格局部最小值。
现在的研究只能找到严格的 局部最优充分条件,而不能找到局部最优的充分条件。
Proof :
采用泰勒定理。
f ( x ⋆ + α d ) − f ( x ⋆ ) = 1 2 d ⊤ F ( x ⋆ ) d + o ( ∥ α d ∥ ) \begin{aligned} f(x^\star + \alpha d) - f(x^\star) = \frac{1}{2}d^\top F(x^\star)d + \mathcal o(\|\alpha d\|) \end{aligned} f ( x ⋆ + α d ) − f ( x ⋆ ) = 2 1  d ⊤ F ( x ⋆ ) d + o ( ∥ α d ∥ )  
Example:
f ( x ) = x 1 2 + x 2 2 f(x) = x_1^2 + x_2^2 f ( x ) = x 1 2  + x 2 2   .
∇ f ( x ) = [ 2 x 1 2 x 2 ] \nabla f(x) = \left[\begin{matrix} 2x_1 \\ 2x_2 \end{matrix}\right] ∇ f ( x ) = [ 2 x 1  2 x 2   ] 
懒得抄了 ***
#  无约束优化算法#  最速下降法x k + 1 = x k + α k d k , α k ∈ R , d k ∈ R n x_{k+1} = x_k + \alpha_kd_k, \; \alpha_k \in \mathbb{R}, d_k \in \mathbb{R}^n x k + 1  = x k  + α k  d k  , α k  ∈ R , d k  ∈ R n 
f ( x k + 1 ) − f ( x k ) = α k ∇ f ( x k ) ⊤ + o ( ∥ α k d k ∥ ) f(x_{k+1}) - f(x_k) = \alpha_k\nabla f(x_k)^\top + \mathcal o(\|\alpha_kd_k\|) f ( x k + 1  ) − f ( x k  ) = α k  ∇ f ( x k  ) ⊤ + o ( ∥ α k  d k  ∥ ) 
其中,d k = ∇ f ( x k ) ∥ ∇ f ( x k ) ∥ \displaystyle d_k = \frac{\nabla f(x_k)}{\|\nabla f(x_k)\|} d k  = ∥ ∇ f ( x k  ) ∥ ∇ f ( x k  )   , α k \alpha_k α k    可以是固定的,也可以是变化的。若取α k = arg min  α > 0 f ( x k − α f ( x k ) ) \alpha_k = \argmin_{\alpha>0}f(x_k-\alpha f(x_k)) α k  = a r g m i n α > 0  f ( x k  − α f ( x k  ) )  . 即作一个一维优化问题,其中α \alpha α   是唯一未知变量。这种方法称为最速下降法  (Steepest decent method).
需要注意的是,最速下降法不同于梯度下降法。未经优化的梯度下降法中,步长 (学习率) α \alpha α   是一个实现确定的值。而此处的 α \alpha α   通过求 arg min  f ( x k − α f ( x k ) ) \argmin f(x_k-\alpha f(x_k)) a r g m i n f ( x k  − α f ( x k  ) )   得到。
#  最速下降法的性质性质 1  x k + 1 − x k x_{k+1}-x_k x k + 1  − x k    orthogonal to x k − x k − 1 x_k-x_{k-1} x k  − x k − 1   . 即最速下降法中的任意相邻两步相互正交。
Proof :
⟨ x k + 1 − x k , x k − x k − 1 ⟩ = α k α k − 1 ⟨ ∇ f ( x k ) , ∇ f ( x k − 1 ) ⟩ \langle x_{k+1}-x_k, x_k-x_{k-1}\rangle = \alpha_k\alpha_{k-1}\langle\nabla f(x_k), \nabla f(x_{k-1})\rangle ⟨ x k + 1  − x k  , x k  − x k − 1  ⟩ = α k  α k − 1  ⟨ ∇ f ( x k  ) , ∇ f ( x k − 1  ) ⟩ 
FONC:
ϕ ′ ( α k ) = 0 \phi'(\alpha_k) = 0 ϕ ′ ( α k  ) = 0 
So, ∇ f ( x − α ∇ f ( x k ) ) ⊤ ( − ∇ f ( x k ) ) = 0 \nabla f(x-\alpha\nabla f(x_k))^\top(-\nabla f(x_k)) = 0 ∇ f ( x − α ∇ f ( x k  ) ) ⊤ ( − ∇ f ( x k  ) ) = 0  , which means ∇ f ( x k + 1 ) ⊤ ∇ f ( x k ) = 0 \nabla f(x_{k+1})^\top\nabla f(x_k) = 0 ∇ f ( x k + 1  ) ⊤ ∇ f ( x k  ) = 0  .
性质 2  If ∇ f ( x k ) ≠ 0 \nabla f(x_k) \neq 0 ∇ f ( x k  )  = 0  , then f ( x k + 1 ) < f ( x k ) f(x_{k+1}) < f(x_k) f ( x k + 1  ) < f ( x k  )  . 即函数值始终严格下降。
Proof :
x k + 1 = x k − α k ∇ f ( x k ) x_{k+1} = x_k - \alpha_k\nabla f(x_k) x k + 1  = x k  − α k  ∇ f ( x k  )  , where α k = arg min  f ( x k − α ∇ f ( x k ) ) \alpha_k = \argmin f(x_k-\alpha \nabla f(x_k)) α k  = a r g m i n f ( x k  − α ∇ f ( x k  ) )  .
So ∀ α ≠ α k \forall \alpha \neq \alpha_k ∀ α  = α k   , ϕ k ( α ) = f ( x k − α ∇ f ( x k ) ) ≥ ϕ k ( α k ) \phi_k(\alpha) = f(x_k - \alpha \nabla f(x_k)) \geq \phi_k(\alpha_k) ϕ k  ( α ) = f ( x k  − α ∇ f ( x k  ) ) ≥ ϕ k  ( α k  )  .
ϕ ′ ( 0 ) = − ( ∇ f ( x k − 0 ∇ f ( x k ) ) ) ⊤ ∇ f ( x k ) = − ∥ ∇ f ( x k ) ∥ 2 < 0 \phi'(0) = -\left(\nabla f(x_k - 0\nabla f(x_k))\right)^\top \nabla f(x_k) = -\|\nabla f(x_k)\|^2 < 0 ϕ ′ ( 0 ) = − ( ∇ f ( x k  − 0 ∇ f ( x k  ) ) ) ⊤ ∇ f ( x k  ) = − ∥ ∇ f ( x k  ) ∥ 2 < 0  .
So, ∃ ω < 0 \exists \omega < 0 ∃ ω < 0  , ∀ α ˉ ∈ ( 0 , ω ] \forall \bar{\alpha} \in (0, \omega] ∀ α ˉ ∈ ( 0 , ω ]  ???????
#  Conjugate Direction Method (共轭方向法)#  相关概念和性质首先给出两向量共轭 的定义。 p , q ∈ R n p,q \in \mathbb{R}^n p , q ∈ R n   are Q Q Q  -conjugate, if p ⊤ Q q = 0 p^\top Qq = 0 p ⊤ Q q = 0  . If Q = I n Q = I_n Q = I n   , then Q Q Q  -conjugate is orthogonal.
若 Q Q Q   是正定实对称矩阵。若 d 0 , d 1 , … , d k d^0, d^1, \dots, d^k d 0 , d 1 , … , d k   是非零向量且两两 Q Q Q   - 共轭。那么这些向量线性独立。
#  性质对二次函数,共轭方向法只需要 n n n   步即可达到最优值。
#  改进k : = 0 k:=0 k : = 0  , initialize x 0 x_0 x 0  g 0 = ∇ f ( x 0 ) g_0 = \nabla f(x_0) g 0  = ∇ f ( x 0  )  . If x 0 = 0 x_0 = 0 x 0  = 0  , then stop ; else, d 0 = − g 0 d_0 = -g_0 d 0  = − g 0   .α k = − g k ⊤ d k d k ⊤ Q d k , x k + 1 = x k + α k d k \alpha_k = -\frac{g_k^\top d_k}{d_k ^\top Qd_k}, x_{k+1} = x_k + \alpha_k d_k α k  = − d k ⊤  Q d k  g k ⊤  d k   , x k + 1  = x k  + α k  d k  g k + 1 = ∇ f ( x k + 1 ) g_{k+1} = \nabla f(x_{k+1}) g k + 1  = ∇ f ( x k + 1  )  . If g k + 1 = 0 g_{k+1} = 0 g k + 1  = 0  , then STOP .\beta_k = -\frac{g_k^\top Q d_k} d k + 1 = − g k + 1 + β k d k d_{k+1} = -g_{k+1}+\beta_k d_k d k + 1  = − g k + 1  + β k  d k  k : = k + 1 k:=k+1 k : = k + 1  . Go to STEP 3 改进后的方法不需要实现给出共轭方向,而是采用类似 Schmidt  正交化的方式得到各共轭向量。证明上述改进的正确性,也只需要证明该方法给出的向量是共轭向量。
#  约束优化问题 (Constrained Optimization)#  拉格朗日定理 (FONC)x ⋆ x^\star x ⋆   is a
#  不等式限制条件min  f ( x ) f : R n → R s . t . h ( x ) = 0 h : R n → R m , m ≤ n g ( x ) ≤ 0 g : R n → R p \begin{aligned} & \min && f(x) && f: \mathbb{R}^n \to \mathbb{R} \\ & s.t. && h(x) = 0 && h: \mathbb{R}^n \to \mathbb{R}^m, m \leq n \\ & && g(x)\leq 0 && g: \mathbb{R}^n \to \mathbb{R}^p \end{aligned}  min s . t .   f ( x ) h ( x ) = 0 g ( x ) ≤ 0   f : R n → R h : R n → R m , m ≤ n g : R n → R p  
#  相关概念如果一个不等式约束 g i ( x ) ≤ 0 g_i(x) \leq 0 g i  ( x ) ≤ 0   在 x ⋆ x^\star x ⋆   处成立,那么称该条件是活跃的  (active), 否则称为不活跃的  (inactive).
若 x ⋆ x^\star x ⋆   满足上述约束条件,即 h ( x ⋆ ) = 0 h(x^\star) = 0 h ( x ⋆ ) = 0   且 g ( x ⋆ ) ≤ 0 g(x^\star) \leq 0 g ( x ⋆ ) ≤ 0  . 令 J ( x ⋆ ) : = { i ∣ g i ( x ⋆ ) = 0 } J(x^\star) := \{i|g_i(x^\star)=0\} J ( x ⋆ ) : = { i ∣ g i  ( x ⋆ ) = 0 }  . 若满足 ∇ g i ( x ⋆ ) , ∇ h i ( x ⋆ ) \nabla g_i(x^\star), \nabla h_i(x^\star) ∇ g i  ( x ⋆ ) , ∇ h i  ( x ⋆ )   线性独立,那么称 x ⋆ x^\star x ⋆   是正则的  (regular).
#  KKT 定理 (FONC)令 f , h , g ∈ C 1 f,h,g \in C^1 f , h , g ∈ C 1  , x ⋆ x^\star x ⋆   是正则点,且是满足上述约束条件的局部极小值,那么存在 λ ⋆ ∈ R m \lambda^\star \in \mathbb{R}^m λ ⋆ ∈ R m   及 μ ⋆ ∈ R p \mu^\star \in \mathbb{R}^p μ ⋆ ∈ R p   满足
∂ L ∂ x i = 0 , ∀ i \frac{\partial L}{\partial x_i} = 0, \forall i ∂ x i  ∂ L  = 0 , ∀ i  . 其中,L = f ( x ) + λ ⋆ h ( x ) + μ ⋆ g ( x ) L = f(x) + \lambda^\star h(x) + \mu^\star g(x) L = f ( x ) + λ ⋆ h ( x ) + μ ⋆ g ( x )  , 称为拉格朗日定常方程式  (Lagrange stationary equation).h ( x ⋆ ) = 0 , g ( x ⋆ ) ≤ 0 h(x^\star) = 0, g(x^\star) \leq 0 h ( x ⋆ ) = 0 , g ( x ⋆ ) ≤ 0  , 即约束条件μ ⋆ ≥ 0 \mu^\star \geq 0 μ ⋆ ≥ 0  , 即对偶可行性条件μ ⋆ ⊤ g ( x ⋆ ) = 0 {\mu^\star}^\top g(x^\star)=0 μ ⋆ ⊤ g ( x ⋆ ) = 0  , 即互补松紧性条件上述四条件合称 KKT 条件 .
求解上述条件方程组殊为不易,一般从等式条件入手采用分类讨论的方式降低变元个数,再进行求解。
#  动态规划“你们都学过,所以不讲了。”