运筹学项目的组成
Formulation Mathematical Modeling Data Collection Validation/Analysis Interpretation/Implementation 运筹学课程大致包括以下内容:
数学优化 (Mathematical Optimization)线性规划 (Linear Programming, LP) 整数规划 (Integer Linear Programming) 非线性规划 (Non-Linear Programming) 动态规划 (Dynamic Programming) 图 (Graphs) 调度 (Scheduling) # 线性规划# 线性规划的概念# 线性规划的基本形式线性规划 (Linear Programming, LP ) 是一类最基本的优化问题,其存在一个最优化的目标函数 f ( x ) f(\boldsymbol{x}) f ( x ) 和一些约束式。如果将约束式和最优化目标的系数进行调整,就可以使目标函数的优化方向为最小化 ,同时保证各线性不等式均为 L H S ≥ R H S \mathrm{LHS} \geq \mathrm{RHS} L H S ≥ R H S , 且约束变量恒非负,我们称此种形式的线性规划问题为规范形式的线性规划问题 (normal form of LP), 其包括一个最小化目标函数 f ( x ) f(\boldsymbol{x}) f ( x ) 及多个线性不等式作为限制条件,可以表示为:
min f ( x ) = ∑ i = 1 n c i x i = c ⊤ x , c , x ∈ R n s . t . A x ≥ b , A ∈ R m × n , b ∈ R m x i ≥ 0 , ∀ i = 1 , 2 , … , n \begin{aligned} & \min && f(\boldsymbol{x}) = \sum_{i=1}^nc_ix_i = \boldsymbol{c}^\top\boldsymbol{x}, && \boldsymbol{c}, \boldsymbol{x} \in \mathbb{R}^n \\ & s.t. && A\boldsymbol{x} \geq \boldsymbol{b}, && A \in \mathbb{R}^{m\times n}, \boldsymbol{b} \in \mathbb{R}^m\\ & && x_i \geq 0, && \forall i=1,2,\dots,n \end{aligned} min s . t . f ( x ) = i = 1 ∑ n c i x i = c ⊤ x , A x ≥ b , x i ≥ 0 , c , x ∈ R n A ∈ R m × n , b ∈ R m ∀ i = 1 , 2 , … , n
如果此时我们引入松弛变量 (slack variable)
x j = b k − ∑ i = 1 n a k , i x i , j = n + 1 , … , n + m x_j = b_k-\sum_{i=1}^na_{k,i}x_i, \;j=n+1,\dots,n+m x j = b k − i = 1 ∑ n a k , i x i , j = n + 1 , … , n + m
那么 A x ≥ b Ax \geq b A x ≥ b 可以转化为 ( A I m ) x ′ = b (A\;I_m)\boldsymbol{x}' = \boldsymbol{b} ( A I m ) x ′ = b , 其中 x ′ = ( x ⊤ , x n + 1 , … , x n + m ) ⊤ \boldsymbol{x}' = (\boldsymbol{x}^\top,x_{n+1}, \dots, x_{n+m})^\top x ′ = ( x ⊤ , x n + 1 , … , x n + m ) ⊤ . 我们将作这种转化后的线性规划问题称为标准形式的线性规划问题 (standard form of LP), 记作
min c ⊤ x s . t . A x = b x ≥ 0 \begin{aligned} & \min && \boldsymbol{c}^\top \boldsymbol{x} \\ & s.t. && A\boldsymbol{x} = \boldsymbol{b}\\ & && \boldsymbol{x} \geq \boldsymbol{0} \end{aligned} min s . t . c ⊤ x A x = b x ≥ 0
注意到,线性方程组 A x = b A\boldsymbol{x} = \boldsymbol{b} A x = b 有 n n n 个未知量。一般认为 A x = b A\boldsymbol{x} = \boldsymbol{b} A x = b 中各方程是相互独立的,因此存在关系 m = r a n k A ≤ n m = \mathrm{rank}{A} \leq n m = r a n k A ≤ n . 一般情况下,默认存在前述限制条件A ∈ R m × n , b ∈ R m , c , x ∈ R n A \in \mathbb{R}^{m\times n}, \boldsymbol{b} \in \mathbb{R}^m, \boldsymbol{c, x} \in \mathbb{R}^n A ∈ R m × n , b ∈ R m , c , x ∈ R n .
# 线性规划的解对标准形式的线性规划,设矩阵 A A A 的一个 m m m 阶满秩方阵为 B B B , 则 B B B 是 A A A 的一个基 (basis). B B B 中 m m m 个线性无关的列向量称为基向量 (basis vectors). x \boldsymbol{x} x 中对应的 m m m 个分量称为基变量 (basic variables), 设其组成 x B ∈ R m \boldsymbol{x}_B \in \mathbb{R}^m x B ∈ R m . 其余 n − m n-m n − m 个称为非基变量 (nonbasic variables). 非基变量都为0 0 0 的解称为 x \boldsymbol{x} x 的基本解 (basic solution). 如果解中存在基变量为 0 0 0 , 那么称这是一个退化的基本解 (degenerate basic solution).
我们可以交换 A A A 的列向量,而对原 LP 问题的解没有影响 (仅交换顺序). 因此,我们在进行形式推导时,往往假设 x \boldsymbol{x} x 中前 m m m 个为其基变量,其余为其非基变量。这样 x = ( x B ⊤ , 0 ⊤ ) ⊤ \boldsymbol{x} = (\boldsymbol{x}_B^\top, \boldsymbol{0}^\top)^\top x = ( x B ⊤ , 0 ⊤ ) ⊤ .
# 线性规划的求解线性规划基本定理 (Fundamental Theorem of LP): 若线性规划问题具有可行解,那么其具有基本可行解。若线性规划问题具有最优可行解,那么其具有最优基本可行解。
证明 假设 x = ( x 1 , x 2 , … , x n ) ⊤ \boldsymbol{x} = (x_1,x_2,\dots, x_n)^\top x = ( x 1 , x 2 , … , x n ) ⊤ 是一个可行解,不妨设其为 x = ( x 1 , x 2 , … , x p , 0 , … , 0 ) ⊤ \boldsymbol{x} = (x_1,x_2,\dots, x_p, 0, \dots, 0)^\top x = ( x 1 , x 2 , … , x p , 0 , … , 0 ) ⊤ , 其中 ∀ 1 ≤ i ≤ p \forall 1 \leq i \leq p ∀ 1 ≤ i ≤ p , x i > 0 x_i>0 x i > 0 . 设 A = ( a 1 , a 2 , … , a m ) A = (\boldsymbol{a}_1, \boldsymbol{a}_2, \dots, \boldsymbol{a}_m) A = ( a 1 , a 2 , … , a m )
w.r.t. : with respect to, 关于
# 单纯形法 (Simplex)# 单纯形法的本质单纯形法的几何意义是寻找多面体顶点,验证其是否为最优解。
# 单纯形法的过程单纯形法算法如下进行:
Find an initial basis feasible solution x x x w.r.t. B B B . Transform [ A b ] [A \; b] [ A b ] to [ I m Y γ ] [I_m \; Y \; \gamma] [ I m Y γ ] . Compute γ i = c i − z i \gamma_i = c_i-z_i γ i = c i − z i , ∀ i ≥ m + 1 \forall i \geq m+1 ∀ i ≥ m + 1 . z i = ∑ l = 1 m c l y l , i z_i = \sum_{l=1}^mc_ly_{l,i} z i = ∑ l = 1 m c l y l , i . If γ i ≥ 0 \gamma_i \geq 0 γ i ≥ 0 , then x x x is optimal. Stop. Otherwise, pick q q q with γ q < 0 \gamma_q < 0 γ q < 0 . (Always pick the minimum). If no y i , q > 0 y_{i,q}>0 y i , q > 0 for all 1 ≤ i ≤ m 1\leq i\leq m 1 ≤ i ≤ m , then stop and report "unbounded", else p = arg min { y i , 0 y i , q ∣ y i , q > 0 } \displaystyle p = \argmin \left\{\left.\frac{y_{i,0}}{y_{i,q}}\right|y_{i,q}>0\right\} p = a r g m i n { y i , q y i , 0 ∣ ∣ ∣ ∣ ∣ y i , q > 0 } B ′ = { a q } ∪ B ∖ { a p } B' = \{a_q\}\cup B \setminus \{a_p\} B ′ = { a q } ∪ B ∖ { a p } and update [ Y , y 0 ] [Y, y_0] [ Y , y 0 ] .Go to Step 3. Why is B ′ B' B ′ a basis?
# 单纯形法举例min Z = − x 1 − 2 x 2 − x 3 s . t . 2 x 1 + x 2 − x 3 ≤ 2 2 x 1 − x 2 + 5 x 3 ≤ 6 4 x 1 + x 2 + x 3 ≤ 6 x 1 , x 2 , x 3 ≥ 0 \begin{aligned} & \min &&Z = -x_1 - 2x_2 - x_3\\ & s.t. && 2x_1 + x_2 - x_3 \leq 2\\ & && 2x_1 - x_2 + 5x_3 \leq 6\\ & && 4x_1 + x_2 + x_3 \leq 6\\ & && x_1,x_2,x_3 \geq 0 \end{aligned} min s . t . Z = − x 1 − 2 x 2 − x 3 2 x 1 + x 2 − x 3 ≤ 2 2 x 1 − x 2 + 5 x 3 ≤ 6 4 x 1 + x 2 + x 3 ≤ 6 x 1 , x 2 , x 3 ≥ 0
过程略。
# 两阶段法 (Two-Phase Method)单纯形法的执行过程包括两个主要部分:寻找初始解 和最优解迭代 。对于约束为 A x ≤ b A\boldsymbol{x} \leq \boldsymbol{b} A x ≤ b 类型的问题,我们可以通过松弛变量找到一个相当弱的初始解 (一般为 x i = 0 , ∀ i x_i = 0, \forall i x i = 0 , ∀ i ). 但对于一些难以求得初始解的 LP 问题,我们需要借助两阶段法 寻找一个合适的初始解。
两阶段法 (two-phase method) 实际上是单纯形法中求初始解的优化方法,其算法可以分为两个部分:
约束条件不变,求解优化目标为 ∑ x a r t \sum x_{art} ∑ x a r t 的 LP 问题,其中 a a r t a_{art} a a r t 为辅助变量 (artificial variables). 从上一阶段得到的基本可行解出发,找到最优的解。 注意到,寻找阶段 1 的初始解是容易的,而阶段 1 的最终解是原问题的一个基本可行解。因此可以简化单纯形法中初始解的寻找问题。
两阶段法的具体做法如下:对约束问题
min c ⊤ x s . t . A x = b x ≥ 0 \begin{aligned} & \min && \boldsymbol{c}^\top \boldsymbol{x} \\ & s.t. && A\boldsymbol{x} = \boldsymbol{b}\\ & && \boldsymbol{x} \geq \boldsymbol{0} \end{aligned} min s . t . c ⊤ x A x = b x ≥ 0
构造辅助变量 x n + 1 , x n + 2 , … , x n + m x_{n+1},x_{n+2},\dots,x_{n+m} x n + 1 , x n + 2 , … , x n + m , 求解
min g = ∑ i = n + 1 n + m x i s . t . A x + ( x n + 1 , x n + 2 , … , x n + m ) ⊤ = b x ≥ 0 , x a r t > 0 \begin{aligned} & \min && g = \sum_{i=n+1}^{n+m} x_i \\ & s.t. && A\boldsymbol{x} + (x_{n+1},x_{n+2},\dots,x_{n+m})^\top = \boldsymbol{b}\\ & && \boldsymbol{x} \geq \boldsymbol{0},\;x_{art}>0 \end{aligned} min s . t . g = i = n + 1 ∑ n + m x i A x + ( x n + 1 , x n + 2 , … , x n + m ) ⊤ = b x ≥ 0 , x a r t > 0
这样,若最小值 g = ∑ x a r t = 0 g = \sum x_{art} = 0 g = ∑ x a r t = 0 , 那么对应的 x \boldsymbol{x} x 就是一个初始解。
# 单纯形表Assumption: The first m m m columns of A A A form a basis B B B . A = [ B , D ] , C = [ C B ⊤ , C D ⊤ ] ⊤ A = [B, D], C = [C_B^\top, C_D^\top]^\top A = [ B , D ] , C = [ C B ⊤ , C D ⊤ ] ⊤ , x = [ x B ⊤ , x D ⊤ ] ⊤ x=[x_B^\top,x_D^\top]^\top x = [ x B ⊤ , x D ⊤ ] ⊤ . Then min c B ⊤ x B + c D ⊤ x D \min c_B^\top x_B + c_D^\top x_D min c B ⊤ x B + c D ⊤ x D , s.t. B x B + D x D = b Bx_B + Dx_D = b B x B + D x D = b , and x B , x D ≥ 0 x_B, x_D \geq 0 x B , x D ≥ 0 .
For the basic solution: x = [ x B ⊤ , 0 ⊤ ] x=[x_B^\top, 0^\top] x = [ x B ⊤ , 0 ⊤ ] , x B = B − 1 b x_B=B^{-1}b x B = B − 1 b , x = [ B − 1 b , 0 ] ⊤ x=[B^{-1}b, 0]^\top x = [ B − 1 b , 0 ] ⊤ . Z D = c B ⊤ B − 1 b Z_D = c_B^\top B^{-1}b Z D = c B ⊤ B − 1 b . Tableau
For a ( m + 1 ) × ( m + 1 ) (m+1)\times(m+1) ( m + 1 ) × ( m + 1 ) matrix
[ A b c ⊤ 0 ] = [ B D b c B ⊤ c D ⊤ 0 ] \left[\begin{matrix} A & b \\ c^\top & 0 \end{matrix}\right] = \left[\begin{matrix} B & D & b \\ c_B^\top & c_D^\top & 0 \end{matrix}\right] [ A c ⊤ b 0 ] = [ B c B ⊤ D c D ⊤ b 0 ]
Exists a ( m + 1 ) × ( m + 1 ) (m+1)\times(m+1) ( m + 1 ) × ( m + 1 ) matrix
[ B − 1 0 0 ⊤ 1 ] × [ B D b c B ⊤ c D ⊤ 0 ] = [ I m B − 1 D = Y B − 1 b = y 0 c B ⊤ c D ⊤ 0 ] \left[\begin{matrix} B^{-1} & \boldsymbol{0} \\ \boldsymbol{0}^\top & \boldsymbol{1} \end{matrix}\right] \times \left[\begin{matrix} B & D & b \\ c_B^\top & c_D^\top & \boldsymbol{0} \end{matrix}\right] = \left[\begin{matrix} I_m & B^{-1}D=Y & B^{-1}b=\boldsymbol{y}_0\\ c_B^\top & c_D^\top & 0 \end{matrix}\right] [ B − 1 0 ⊤ 0 1 ] × [ B c B ⊤ D c D ⊤ b 0 ] = [ I m c B ⊤ B − 1 D = Y c D ⊤ B − 1 b = y 0 0 ]
继续变换,得到
[ I m 0 − c B ⊤ 1 ] × [ I m B − 1 D B − 1 b c B ⊤ c D ⊤ 0 ] = [ I m B − 1 D B − 1 b 0 c D ⊤ − c B ⊤ B − 1 D − c B ⊤ B − 1 b ] ( ∗ ) \left[\begin{matrix} I_m & \boldsymbol{0} \\ -c_B^\top & \boldsymbol{1} \end{matrix}\right] \times \left[\begin{matrix} I_m & B^{-1}D & B^{-1}b \\ c_B^\top & c_D^\top & 0 \end{matrix}\right] = \left[\begin{matrix} I_m & B^{-1}D & B^{-1}b\\ \boldsymbol{0} & c_D^\top-c_B^\top B^{-1}D & -c_B^\top B^{-1}b \end{matrix}\right](*) [ I m − c B ⊤ 0 1 ] × [ I m c B ⊤ B − 1 D c D ⊤ B − 1 b 0 ] = [ I m 0 B − 1 D c D ⊤ − c B ⊤ B − 1 D B − 1 b − c B ⊤ B − 1 b ] ( ∗ )
其中,( ∗ ) (*) ( ∗ ) 的右下角项− c B ⊤ B − 1 b = − Z 0 -c_B^\top B^{-1}b = -Z_0 − c B ⊤ B − 1 b = − Z 0 是目标函数值。且c D ⊤ − c B ⊤ B − 1 D = r D ⊤ c_D^\top - c_B^\top B^{-1}D = r_D^\top c D ⊤ − c B ⊤ B − 1 D = r D ⊤ .
因此这样可以顺便算出r D r_D r D 和− Z 0 -Z_0 − Z 0 , 但要求是( ∗ ) (*) ( ∗ ) 矩阵的左下块为0 \boldsymbol{0} 0 .
给了一个需要自行证明的结论:
∀ j ≥ m + 1 \forall j \geq m+1 ∀ j ≥ m + 1 :
[ Y y 0 c D ⊤ 0 ] → [ Y y 0 ′ c D ⊤ Z ′ ] \left[\begin{matrix} Y & \boldsymbol{y}_0 \\ c_D^\top & \boldsymbol{0} \end{matrix}\right] \to \left[\begin{matrix} Y & \boldsymbol{y}_0'\\ c_D^\top & Z' \end{matrix}\right] [ Y c D ⊤ y 0 0 ] → [ Y c D ⊤ y 0 ′ Z ′ ]
只需要∀ 1 ≤ i ≤ m + 1 : y i j ′ = y i j − y p j y p q y i q , i ≠ p \forall 1 \leq i \leq m+1:\quad y_{ij}' = y_{ij}-\frac{y_{pj}}{y_{pq}}y_{iq}, i\neq p ∀ 1 ≤ i ≤ m + 1 : y i j ′ = y i j − y p q y p j y i q , i = p , y i j ′ = y p i y p q , i = p y_{ij}'=\frac{y_{pi}}{y_{pq}}, i=p y i j ′ = y p q y p i , i = p .
Example :
max 7 x 1 + 6 x 2 , s . t . 2 x 1 + x 2 ≤ 3 x 1 + 4 x 2 ≤ 4 x 1 , x 2 ≥ 0 \begin{aligned} & \max && 7x_1 + 6x_2, \\ & s.t. && 2x_1 + x_2 \leq 3 \\ & && x_1 + 4x_2 \leq 4 \\ & && x_1, x_2 \geq 0 \end{aligned} max s . t . 7 x 1 + 6 x 2 , 2 x 1 + x 2 ≤ 3 x 1 + 4 x 2 ≤ 4 x 1 , x 2 ≥ 0
标准化,得到
min − 7 x 1 − 6 x 2 s . t . 2 x 1 + x 2 + x 3 = 3 x 1 + 4 x 2 + x 4 = 4 x 1 , x 2 , x 3 , x 4 ≥ 0 \begin{aligned} &\min && -7x_1-6x_2\\ &s.t. && 2x_1 + x_2 + x_3 = 3 \\ &&& x_1 + 4x_2 + x_4 = 4 \\ &&& x_1, x_2, x_3, x_4\geq 0 \end{aligned} min s . t . − 7 x 1 − 6 x 2 2 x 1 + x 2 + x 3 = 3 x 1 + 4 x 2 + x 4 = 4 x 1 , x 2 , x 3 , x 4 ≥ 0
Then,
A = [ 2 1 1 0 1 4 0 1 ] A= \left[\begin{matrix} 2 & 1 & 1 & 0\\ 1 & 4 & 0 & 1\\ \end{matrix}\right] A = [ 2 1 1 4 1 0 0 1 ]
So,
[ A b c ⊤ 0 ] = [ 2 1 1 0 3 1 4 0 1 3 − 7 − 6 0 0 0 ] \left[\begin{matrix} A & b \\ c^\top & \boldsymbol{0} \end{matrix}\right] = \left[\begin{matrix} 2 & 1 & 1 & 0 & 3 \\ 1 & 4 & 0 & 1 & 3 \\ -7 & -6 & 0 & 0 & 0 \\ \end{matrix}\right] [ A c ⊤ b 0 ] = ⎣ ⎢ ⎡ 2 1 − 7 1 4 − 6 1 0 0 0 1 0 3 3 0 ⎦ ⎥ ⎤
and x = [ x 1 , x 2 , x 3 , x 4 ] ⊤ = [ 0 , 0 , 3 , 4 ] ⊤ x = [x_1,x_2,x_3,x_4]^\top = [0,0,3,4]^\top x = [ x 1 , x 2 , x 3 , x 4 ] ⊤ = [ 0 , 0 , 3 , 4 ] ⊤ .
So r 1 = − 7 , r 2 = − 6 ⇒ q = 1 r_1 = -7, r_2 = -6 \Rightarrow q = 1 r 1 = − 7 , r 2 = − 6 ⇒ q = 1 .
[ 1 1 2 1 2 0 3 2 0 7 2 − 1 2 1 5 2 0 − 5 2 7 2 0 21 2 ] \left[\begin{matrix} 1 & \frac{1}{2} & \frac{1}{2} & 0 & \frac{3}{2}\\ 0 & \frac{7}{2} & -\frac{1}{2} & 1 & \frac{5}{2}\\ 0 & -\frac{5}{2} & \frac{7}{2} & 0 & \frac{21}{2} \end{matrix}\right] ⎣ ⎢ ⎡ 1 0 0 2 1 2 7 − 2 5 2 1 − 2 1 2 7 0 1 0 2 3 2 5 2 2 1 ⎦ ⎥ ⎤
Simplex 的反例 (会走入死循环,cycling)
min − 3 4 x 1 + 20 x 2 − 1 2 x 3 + 6 x 4 s . t . 1 4 x 1 − 8 x 2 − x 3 + 9 x 4 + x 5 = 0 1 2 x 1 − 12 x 2 − 1 2 x 3 + 3 x 4 + x 6 = 0 x 3 + x 7 = 1 x 1 , … , x 7 ≥ 0 \begin{aligned} & \min && -\frac{3}{4}x_1 + 20x_2 - \frac{1}{2}x_3 + 6x_4 \\ & s.t. && \frac{1}{4}x_1-8x_2-x_3+9x_4+x_5 = 0 \\ & && \frac{1}{2}x_1-12x_2-\frac{1}{2}x_3+3x_4+x_6=0 \\ & && x_3+x_7 = 1\\ & && x_1, \dots, x_7 \geq 0 \end{aligned} min s . t . − 4 3 x 1 + 2 0 x 2 − 2 1 x 3 + 6 x 4 4 1 x 1 − 8 x 2 − x 3 + 9 x 4 + x 5 = 0 2 1 x 1 − 1 2 x 2 − 2 1 x 3 + 3 x 4 + x 6 = 0 x 3 + x 7 = 1 x 1 , … , x 7 ≥ 0
# Improvement of Simplex Method[ A , b ] ⇝ [ I m , Y , y 0 ] ⇝ [ I m , Y ′ , y 0 ] ⇝ ⋯ [A,b]\leadsto[I_m, Y, y_0]\leadsto[I_m, Y', y_0]\leadsto \cdots [ A , b ] ⇝ [ I m , Y , y 0 ] ⇝ [ I m , Y ′ , y 0 ] ⇝ ⋯
若 n ≫ m n\gg m n ≫ m , 绝大多数列没有机会进入基。
# 过程总结Form a revised tableau [ B − 1 , y 0 = B − 1 b ] [B^{-1}, y_0=B^{-1}b] [ B − 1 , y 0 = B − 1 b ] . Calculate r D ⊤ = c D ⊤ − λ ⊤ D , λ ⊤ = c B ⊤ B − 1 r_D^\top = c_D^\top - \lambda^\top D, \lambda^\top =c_B^\top B^{-1} r D ⊤ = c D ⊤ − λ ⊤ D , λ ⊤ = c B ⊤ B − 1 . If ∀ j , r j ≥ 0 \forall j, r_j \geq 0 ∀ j , r j ≥ 0 , then output x x x . Select q q q with r q < 0 r_q < 0 r q < 0 . Compute y q = B − 1 a q y_q = B^{-1}a_q y q = B − 1 a q . If no y i q > 0 y_{iq} > 0 y i q > 0 , then stop. (unbounded ) p = arg min { y i 0 y i q : y i q > 0 } p = \argmin\{\frac{y_{i0}}{y_{iq}}:y_{iq}>0\} p = a r g m i n { y i q y i 0 : y i q > 0 } . B max − 1 = E × B − 1 B^{-1}_{\max} = E \times B^{-1} B m a x − 1 = E × B − 1 .每步的计算量减小。
Example:
{ min 3 x 1 + 5 x 2 s . t . x 1 + x 2 ≤ 4 5 x 1 + 3 x 2 ≥ 8 x 1 , x 2 ≥ 0 \left\{\begin{aligned} & \min & & 3x_1+5x_2 \\ & s.t. & & x_1+x_2\leq 4\\ & & & 5x_1+3x_2\geq 8\\ & & & x_1,x_2 \geq 0 \end{aligned}\right. ⎩ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨ ⎪ ⎪ ⎪ ⎪ ⎪ ⎧ min s . t . 3 x 1 + 5 x 2 x 1 + x 2 ≤ 4 5 x 1 + 3 x 2 ≥ 8 x 1 , x 2 ≥ 0
# 线性规划的对偶问题对偶性 (duality) 在近似算法、LP 等多种方向都有所应用。
# 对偶问题的概念对一个规范线性规划
min c ⊤ x s . t . A x ≥ b x ≥ 0 \begin{aligned} & \min && \boldsymbol{c}^\top \boldsymbol{x} \\ & s.t. && A \boldsymbol{x} \geq \boldsymbol{b}\\ & && \boldsymbol{x} \geq \boldsymbol{0} \end{aligned} min s . t . c ⊤ x A x ≥ b x ≥ 0
其对偶问题定义为
max b ⊤ y s . t . y ⊤ A ≤ c y ≥ 0 \begin{aligned} & \max && \boldsymbol{b}^\top \boldsymbol{y} \\ & s.t. && \boldsymbol{y}^\top A \leq \boldsymbol{c}\\ & && \boldsymbol{y} \geq \boldsymbol{0} \end{aligned} max s . t . b ⊤ y y ⊤ A ≤ c y ≥ 0
这样可以推得,对问题
min c ⊤ x s . t . A x = b x ≥ 0 \begin{aligned} & \min && \boldsymbol{c}^\top \boldsymbol{x} \\ & s.t. && A \boldsymbol{x} = \boldsymbol{b} \\ & && \boldsymbol{x} \geq \boldsymbol{0} \end{aligned} min s . t . c ⊤ x A x = b x ≥ 0
其对偶问题为
max b ⊤ y s . t . y ⊤ A ≤ c \begin{aligned} & \max && \boldsymbol{b}^\top \boldsymbol{y} \\ & s.t. && \boldsymbol{y}^\top A \leq \boldsymbol{c} \end{aligned} max s . t . b ⊤ y y ⊤ A ≤ c
# 对偶问题的性质Claim : The dual of dual is primal. (对偶的对偶是原问题)
Lemma : Suppose x x x and y y y are feasible solution to the primal and dual LPs respectively. Then, c ⊤ x ≥ y ⊤ b c^\top x \geq y^\top b c ⊤ x ≥ y ⊤ b . 即互相对偶的问题中,min 那项对应的值更大。
Lemma : If one is unbounded, then the other has no feasible solution.
Theorem : Suppose x 0 , y 0 x_0,y_0 x 0 , y 0 feasible solutions to primal, dual rep. If c ⊤ x 0 = b ⊤ y 0 c^\top x_0 = b^\top y_0 c ⊤ x 0 = b ⊤ y 0 , then x 0 , y 0 x_0, y_0 x 0 , y 0 are optimal. 碰上了一定是最优的。
Proof : 略
# 线性规划的其它快速求解方法# 椭圆球算法 (Ellipsoid)by Khachiyan
Def of ellipsoid:
E = E ( A , a ) = { x ∈ R n ∣ ( x − a ) ⊤ A − 1 ( x − a ) ≤ 1 } E = E(A,a) = \{x \in \mathbb{R}^n|(x-a)^\top A^{-1} (x-a) \leq 1\} E = E ( A , a ) = { x ∈ R n ∣ ( x − a ) ⊤ A − 1 ( x − a ) ≤ 1 } . 其宽度 (width) 为 α \sqrt\alpha α , 半径 (radius) 为 β \sqrt{\beta} β . 其中 α , β \alpha, \beta α , β 分别是 A A A 最小和最大的特征值。
对一个 LP 问题:
{ min c ⊤ x s . t . A x ≥ b \left\{\begin{aligned} & \min && c^\top x\\ & s.t. && Ax \geq b\\ \end{aligned}\right. { min s . t . c ⊤ x A x ≥ b
# Karmarkar algorithm【线性规划 (七)】内点法 - 王源的文章 - 知乎
Def : Canonical from of LP
min c ⊤ x s . t . A x = 0 ∑ x i = 1 x i ≥ 0 \begin{aligned} & \min & c^\top x &\\ & s.t. & Ax = 0&\\ & & \sum_{x_i} = 1& \\ && x_i \geq 0\\ \end{aligned} min s . t . c ⊤ x A x = 0 x i ∑ = 1 x i ≥ 0
Def: Ω = { x : A x = 0 } , Δ = { e ⊤ x = 1 } \Omega=\{x:Ax = 0\}, \Delta = \{e^\top x = 1\} Ω = { x : A x = 0 } , Δ = { e ⊤ x = 1 } .
Nullspace of a matrix M M M : { x ∣ M x = 0 } \{x|Mx = 0\} { x ∣ M x = 0 } . center of Δ \Delta Δ : α 0 = e n = [ 1 / n , 1 / n , … , 1 / n ] ⊤ \alpha_0 = \frac{e}{n} = [1/n,1/n,\dots,1/n]^\top α 0 = n e = [ 1 / n , 1 / n , … , 1 / n ] ⊤ .
Ω ∩ Δ = { x : ( A e ⊤ ) x = ( ) } \Omega \cap \Delta = \left\{x:\binom{A}{e^\top}x = \binom{}{} \right\} Ω ∩ Δ = { x : ( e ⊤ A ) x = ( ) }
Assumptions:
α 0 ∈ Ω \alpha_0 \in \Omega α 0 ∈ Ω Opt. value of c ⊤ x = 0 c^\top x = 0 c ⊤ x = 0 . r a n k [ A e ⊤ ] = m + 1 \displaystyle rank\left[\begin{matrix}A \\ e^\top \end{matrix}\right] = m+1 r a n k [ A e ⊤ ] = m + 1 .∃ q ∈ R + \exists q \in \mathbb{R}^+ ∃ q ∈ R + , s.t. if c ⊤ x c ⊤ x 0 ≤ 2 − q \displaystyle \frac{c^\top x}{c^\top x_0} \leq 2^{-q} c ⊤ x 0 c ⊤ x ≤ 2 − q . then x x x is optimal.(待整理)
# 整数线性规划# 概念整数线性规划 (Integer Linear Programming, ILP) 问题的形式如下:
min c ⊤ x s . t . A x = b x ≥ 0 x ∈ Z n \begin{aligned} & \min && \boldsymbol{c}^\top \boldsymbol{x} \\ & s.t. && A\boldsymbol{x} = \boldsymbol{b}\\ & && \boldsymbol{x} \geq \boldsymbol{0}\\ & && \boldsymbol{x} \in \mathbb{Z}^n \end{aligned} min s . t . c ⊤ x A x = b x ≥ 0 x ∈ Z n
首先举几个整数线性规划问题的例子。
# 旅行商问题 (TSP)# Gomory 割平面法# 凸优化# 基本概念可行解空间 Ω \Omega Ω 是凸集 ,且优化目标函数 f f f 是一个凸函数 ,那么称为一个 凸优化 (convex optimization).
# 凸集S ⊂ R n S \subset \mathbb{R}^n S ⊂ R n 是凸 (convex) 的,当且仅当 ∀ a , b ∈ S , ∀ α ∈ ( 0 , 1 ) \forall a,b \in S, \forall \alpha \in (0,1) ∀ a , b ∈ S , ∀ α ∈ ( 0 , 1 ) , 有 α a + ( 1 − α ) b ∈ S \alpha a + (1-\alpha) b \in S α a + ( 1 − α ) b ∈ S .
# 凸函数S ∈ R n , ∀ a , b ∈ S , ∀ α ∈ ( 0 , 1 ) S \in \mathbb{R}^n, \;\forall a,b \in S, \;\forall \alpha \in (0,1) S ∈ R n , ∀ a , b ∈ S , ∀ α ∈ ( 0 , 1 ) , 若
f ( α a + ( 1 − α b ) ) ≤ α f ( a ) + ( 1 − α ) f ( b ) f(\alpha a + (1-\alpha b)) \leq \alpha f(a) + (1-\alpha)f(b) f ( α a + ( 1 − α b ) ) ≤ α f ( a ) + ( 1 − α ) f ( b )
那么称函数 f f f 是凸函数 (convex function).
# 性质与例子若 f 1 , f 2 f_1, f_2 f 1 , f 2 是凸函数,那么
α f 1 \alpha f_1 α f 1 是凸函数,其中 α > 0 \alpha > 0 α > 0 .f 1 + f 2 f_1 + f_2 f 1 + f 2 是凸函数。# 例 1f ( x ) = ∥ x ∥ f(x) = \|x\| f ( x ) = ∥ x ∥ 是凸函数。采用 Cauchy -Schwarz 不等式即可证明:
∥ x + y ∥ 2 = ∥ x ∥ 2 + 2 ⟨ x , y ⟩ + ∥ y ∥ 2 ≤ ∥ x ∥ 2 + 2 ∥ x ∥ ∥ y ∥ + ∥ y ∥ 2 = ( ∥ x ∥ + ∥ y ∥ ) 2 \|x+y\|^2 = \|x\|^2 + 2\langle x,y\rangle + \|y\|^2 \leq \|x\|^2 + 2\|x\|\,\|y\| + \|y\|^2 = (\|x\|+\|y\|)^2 ∥ x + y ∥ 2 = ∥ x ∥ 2 + 2 ⟨ x , y ⟩ + ∥ y ∥ 2 ≤ ∥ x ∥ 2 + 2 ∥ x ∥ ∥ y ∥ + ∥ y ∥ 2 = ( ∥ x ∥ + ∥ y ∥ ) 2
# 例 2给定凸集 S ∈ R n S \in \mathbb{R}^n S ∈ R n , 其上的凸函数 f : S → R f:S \to \mathbb{R} f : S → R 和常数 c ∈ R c \in \mathbb{R} c ∈ R , 其水平集 (level set, 又称等值面 ) H S ( f , c ) = { x ∈ S : f ( x ) ≤ c } H_S(f,c) = \{x \in S:f(x) \leq c\} H S ( f , c ) = { x ∈ S : f ( x ) ≤ c } 是凸集。
证明
∀ x 1 , x 2 ∈ H S \forall x_1, x_2 \in H_S ∀ x 1 , x 2 ∈ H S , ∀ α ∈ ( 0 , 1 ) \forall \alpha \in (0,1) ∀ α ∈ ( 0 , 1 ) , 根据 f f f 的凸性,f ( α x 1 + ( 1 − α ) x 2 ) ≤ α f ( x 1 ) + ( 1 − α ) f ( x 2 ) f(\alpha x_1 + (1-\alpha)x_2) \leq \alpha f(x_1) + (1-\alpha) f(x_2) f ( α x 1 + ( 1 − α ) x 2 ) ≤ α f ( x 1 ) + ( 1 − α ) f ( x 2 ) .
又 x 1 , x 2 ∈ H S x_1, x_2 \in H_S x 1 , x 2 ∈ H S , 于是 α f ( x 1 ) + ( 1 − α ) f ( x 2 ) ≤ α c + ( 1 − α ) c = c \alpha f(x_1) + (1-\alpha) f(x_2) \leq \alpha c + (1-\alpha)c = c α f ( x 1 ) + ( 1 − α ) f ( x 2 ) ≤ α c + ( 1 − α ) c = c .
故 α x 1 + ( 1 − α ) x 2 ∈ H S ( f , c ) \alpha x_1 + (1-\alpha)x_2 \in H_S(f,c) α x 1 + ( 1 − α ) x 2 ∈ H S ( f , c ) .
# 例 3对 f : S → R f: S \to \mathbb{R} f : S → R , 定义 f f f 的上图像 (epigraph) 为
e p i ( f ) = [ x c ] : x ∈ S , c ∈ R , f ( x ) ≤ c \mathrm{epi}(f) = \left[ \begin{aligned} x \\ c \end{aligned} \right]: x \in S, c \in \mathbb{R}, f(x) \leq c e p i ( f ) = [ x c ] : x ∈ S , c ∈ R , f ( x ) ≤ c
# 定理S ∈ R n S \in \mathbb{R}^n S ∈ R n 是一个非空开凸集。f : S → R f: S \to \mathbb{R} f : S → R 且 f ∈ C 1 f \in C^1 f ∈ C 1 . 那么有 f f f 是 S S S 上的凸函数 ⟺ ∀ x , y ∈ S \iff \forall x, y \in S ⟺ ∀ x , y ∈ S 有 f ( y ) ≥ f ( x ) + ∇ f ⊤ ( x ) ( y − x ) f(y) \geq f(x) + \nabla f^\top(x) (y-x) f ( y ) ≥ f ( x ) + ∇ f ⊤ ( x ) ( y − x ) .
Proof :
"⇒ \Rightarrow ⇒ ":
f ∈ C 1 f \in C^1 f ∈ C 1 is convex, then
∀ x , y ∈ S : f ( α y + ( 1 − α ) x ) ≤ α f ( y ) + ( 1 − α ) f ( x ) \forall x,y \in S: \; f(\alpha y + (1-\alpha)x) \leq \alpha f(y) + (1-\alpha)f(x) ∀ x , y ∈ S : f ( α y + ( 1 − α ) x ) ≤ α f ( y ) + ( 1 − α ) f ( x )
So
f ( x + α ( y − x ) ) − f ( x ) ≤ α ( f ( y ) − f ( x ) ) f\big(x + \alpha(y-x)\big) - f(x) \leq \alpha(f(y)-f(x)) f ( x + α ( y − x ) ) − f ( x ) ≤ α ( f ( y ) − f ( x ) )
By Taylor Theorem:
f ( x + α ( y − x ) ) = α ∇ f ⊤ ( y − x ) + o ( ∥ α ( y − x ) ∥ ) f\big(x + \alpha(y-x)\big) = \alpha \nabla f^\top(y-x) + o(\|\alpha(y-x)\|) f ( x + α ( y − x ) ) = α ∇ f ⊤ ( y − x ) + o ( ∥ α ( y − x ) ∥ )
Let α → 0 \alpha \to 0 α → 0 ,
lim α → 0 o ( ∥ α ( y − x ) ∥ ) α = 0 \lim_{\alpha \to 0}\frac{o(\|\alpha (y-x)\|)}{\alpha} = 0 α → 0 lim α o ( ∥ α ( y − x ) ∥ ) = 0
but o ∥ α ( y − x ) ∥ > 0 o\|\alpha (y-x)\|>0 o ∥ α ( y − x ) ∥ > 0 , so
∇ f ⊤ ( x ) ( y − x ) ≤ f ( y ) − f ( x ) \nabla f^\top(x) (y-x) \leq f(y) - f(x) ∇ f ⊤ ( x ) ( y − x ) ≤ f ( y ) − f ( x )
"⇐ \Leftarrow ⇐ ":
Def z = α u + ( 1 − α ) v z = \alpha u + (1-\alpha)v z = α u + ( 1 − α ) v , for u , v ∈ S , α ∈ ( 0 , 1 ) u,v \in S, \alpha\in(0,1) u , v ∈ S , α ∈ ( 0 , 1 ) .
So z ∈ S z \in S z ∈ S , and
f ( u ) ≥ f ( z ) + ∇ f ⊤ ( z ) ( u − z ) f ( v ) ≥ f ( z ) + ∇ f ⊤ ( z ) ( v − z ) f(u) \geq f(z) + \nabla f^\top(z)(u-z)\\ f(v) \geq f(z) + \nabla f^\top(z)(v-z) f ( u ) ≥ f ( z ) + ∇ f ⊤ ( z ) ( u − z ) f ( v ) ≥ f ( z ) + ∇ f ⊤ ( z ) ( v − z )
Then
α f ( u ) + ( 1 − α ) f ( v ) ≥ f ( z ) + ∇ f ⊤ ( z ) ( α ( u − z ) + ( 1 − α ) ( v − z ) ) ≥ f ( z ) \alpha f(u) + (1-\alpha)f(v) \geq f(z) + \nabla f^\top(z)\big(\alpha(u-z) + (1-\alpha)(v-z)\big) \geq f(z) α f ( u ) + ( 1 − α ) f ( v ) ≥ f ( z ) + ∇ f ⊤ ( z ) ( α ( u − z ) + ( 1 − α ) ( v − z ) ) ≥ f ( z )
# Thm:S ⊂ R n S \subset \mathbb{R}^n S ⊂ R n , f : S → R f:S \to \mathbb{R} f : S → R and f ∈ C 2 f \in C^2 f ∈ C 2 . Then f f f is convex ⟺ \iff ⟺ ∀ x ∈ S \forall x \in S ∀ x ∈ S , the Hessian H ( x ) H(x) H ( x ) of f f f at x x x is positive semi-definite (半正定,x ⊤ H x ≥ 0 x^\top H x \geq 0 x ⊤ H x ≥ 0 ).
Proof :
"⇐ \Leftarrow ⇐ ":
By Taylor Thm, ∃ α ∈ ( 0 , 1 ) \exist \alpha \in (0,1) ∃ α ∈ ( 0 , 1 ) . f ( y ) = f ( x ) + ∇ f ⊤ ( x ) ( y − x ) + 1 2 ( y − x ) ⊤ H ( x + α ( y − x ) ) ( y − x ) f(y) = f(x) + \nabla f^\top(x)(y-x) + \frac{1}{2} (y-x)^\top H(x + \alpha(y-x)) (y-x) f ( y ) = f ( x ) + ∇ f ⊤ ( x ) ( y − x ) + 2 1 ( y − x ) ⊤ H ( x + α ( y − x ) ) ( y − x ) .
"⇒ \Rightarrow ⇒ ":
By contraposition.(逆否命题) ∃ x ∈ S , ∃ d ∈ R n \exist x \in S, \exist d \in \mathbb{R}^n ∃ x ∈ S , ∃ d ∈ R n s.t. d ⊤ H ( x ) d < 0 d^\top H(x) d < 0 d ⊤ H ( x ) d < 0 .
S S S is open, so ∃ t ∈ R \exist t \in \mathbb{R} ∃ t ∈ R with y + t d ∈ S y + td \in S y + t d ∈ S . Because H H H is continuous, ∀ z = α x + ( 1 − α ) y \forall z = \alpha x + (1-\alpha) y ∀ z = α x + ( 1 − α ) y , d ⊤ H ( z ) d < 0 d^\top H(z) d < 0 d ⊤ H ( z ) d < 0 .
f ( y ) = f ( x ) + ∇ f ⊤ ( x ) ( y − x ) + 1 2 ( y − x ) ⊤ H ( x + α ( y − z ) ) ( y − z ) = f ( x ) + 1 2 f ⊤ ( y − x ) + 1 2 d ⊤ H ( z ) d < 0 f(y) = f(x) + \nabla f^\top (x)(y-x) + \frac{1}{2} (y-x)^\top H(x + \alpha(y-z))(y-z) = f(x) + \frac{1}{2}f^\top(y-x) + \frac{1}{2}d^\top H(z) d < 0 f ( y ) = f ( x ) + ∇ f ⊤ ( x ) ( y − x ) + 2 1 ( y − x ) ⊤ H ( x + α ( y − z ) ) ( y − z ) = f ( x ) + 2 1 f ⊤ ( y − x ) + 2 1 d ⊤ H ( z ) d < 0 .
# Examplef ( x ) = 1 2 x ⊤ A x + b ⊤ x + c f(x) = \frac{1}{2}x^\top A x + b^\top x + c f ( x ) = 2 1 x ⊤ A x + b ⊤ x + c , A ∈ R n × n A \in \mathbb{R}^{n \times n} A ∈ R n × n s.t. A = A ⊤ A = A^\top A = A ⊤ , x , b ∈ R n , c ∈ R x,b \in \mathbb{R}^n, c \in \mathbb{R} x , b ∈ R n , c ∈ R .
Then, ∇ f ( x ) = A x + b \nabla f(x) = Ax + b ∇ f ( x ) = A x + b . F ( x ) = A F(x) = A F ( x ) = A .
So f f f is convex ⟺ \iff ⟺ A A A is positive semi-definite.
# 凸规划min f ( x ) f convex s . t . x ∈ Ω Ω convex \begin{aligned} \min f(x) & \quad f\; \text{convex} \\ s.t. \;x \in \Omega &\quad \Omega \;\text{convex} \end{aligned} min f ( x ) s . t . x ∈ Ω f convex Ω convex
Then, x ⋆ x^\star x ⋆ is a global optimizer ⟺ \iff ⟺ x ⋆ x^\star x ⋆ is local optimizer.
Proof : "⇐ \Leftarrow ⇐ ":
x ⋆ x^\star x ⋆ is global ⇒ y ∈ Ω : f ( y ) < f ( x ⋆ ) \Rightarrow y \in \Omega: f(y) < f(x^\star) ⇒ y ∈ Ω : f ( y ) < f ( x ⋆ ) .
Prop:
f f f is convex on Ω \Omega Ω , then the set of all optimizers of f f f is convex.
Prop2:
Q ∈ R Q \in \mathbb{R} Q ∈ R is open, convex. f ∈ C 1 f \in C^1 f ∈ C 1 . If ∃ x ⋆ ∈ Q \exist x^\star \in Q ∃ x ⋆ ∈ Q s.t. ∀ x ∈ Q \forall x \in Q ∀ x ∈ Q with x ≠ x ⋆ x \neq x^\star x = x ⋆ , we have ∇ f ⊤ ( x ) ( x − x ⋆ ) ≥ 0 \nabla f^\top(x)(x-x^\star) \geq 0 ∇ f ⊤ ( x ) ( x − x ⋆ ) ≥ 0 , then x ⋆ x^\star x ⋆ is a global optimizer.
Note: ∇ f ( x ⋆ ) = 0 ⇒ x ⋆ \nabla f(x^\star) = 0 \Rightarrow x^\star ∇ f ( x ⋆ ) = 0 ⇒ x ⋆ is optimizer.
所以怎么找到一个局部最优解捏?
# 一维的优化问题就是求最小值min f ( x ) \min f(x) min f ( x ) .
# 黄金分割方法 (Golden Section)Assumption: f f f is unimodal (单峰的), which means [ a , b ] [a,b] [ a , b ] only one local minimizer.
求 min f ( x ) \min f(x) min f ( x )
其循环次数可以如下计算 (停机条件为b k − a k < ε b_k-a_k < \varepsilon b k − a k < ε ):
if b i − a i = ( 1 − ρ ) ( b i − 1 − a i − 1 ) b_i-a_i = (1-\rho) (b_{i-1}-a_{i-1}) b i − a i = ( 1 − ρ ) ( b i − 1 − a i − 1 ) , then ( b n − a n ) ( 1 − ρ ) n ≤ ε (b_n-a_n)(1-\rho)^n \leq \varepsilon ( b n − a n ) ( 1 − ρ ) n ≤ ε .
# Fibonacci Method是否存在比黄金分割法更节约计算的方式?这时考虑采用不同的ρ \rho ρ . 要求
min ∏ i = 1 N ( 1 − ρ i ) s . t . ρ k + 1 ( 1 − ρ k ) = 1 − 2 ρ k ∀ k ρ k ∈ ( 0 , 1 / 2 ) \begin{aligned} & \min && \prod_{i=1}^N(1-\rho_i)\\ & s.t. && \rho_{k+1}(1-\rho_k) = 1-2\rho_k & \forall k \\ & && \rho_k \in (0,1/2) \end{aligned} min s . t . i = 1 ∏ N ( 1 − ρ i ) ρ k + 1 ( 1 − ρ k ) = 1 − 2 ρ k ρ k ∈ ( 0 , 1 / 2 ) ∀ k
直接给出这个优化问题的答案。与斐波那契数列{ F n } \{F_n\} { F n } 有关。ρ k = 1 − F N + 1 − k / F N + 2 − 2 \rho_k = 1-F_{N+1-k}/F_{N+2-2} ρ k = 1 − F N + 1 − k / F N + 2 − 2 . ρ 1 = 1 − F N / F N + 1 \rho_1 = 1-F_N/F_{N+1} ρ 1 = 1 − F N / F N + 1 .
Then, min ∏ i = 1 N ( 1 − ρ i ) = 1 F N + 1 > 1 / ε \displaystyle \min\prod_{i=1}^N(1-\rho_i) = \frac{1}{F_{N+1}} > 1/\varepsilon min i = 1 ∏ N ( 1 − ρ i ) = F N + 1 1 > 1 / ε .
# Newton's Method要求f ∈ C 2 f \in C^2 f ∈ C 2 .
采用牛顿法求根,其迭代式为
x i = x i − 1 − f ( x i − 1 ) f ′ ( x i − 1 ) x_i = x_{i-1} - \frac{f(x_{i-1})}{f'(x_{i-1})} x i = x i − 1 − f ′ ( x i − 1 ) f ( x i − 1 )
于是要求最小值,只需要求x ⋆ x^\star x ⋆ 使得f ′ ( x ⋆ ) = 0 f'(x^\star) = 0 f ′ ( x ⋆ ) = 0 . 这样采用牛顿法即可。
# 无约束优化问题 (Unconstrained Optimization)# 无约束优化问题的最优性条件min f ( x ) , x ∈ R , f : R n → R \min f(x), \; x \in \mathbb{R},\,f:\mathbb{R}^n \to \mathbb{R} min f ( x ) , x ∈ R , f : R n → R .
Necessary Condition (必要条件):用来判断一个点不是最优解。 Sufficient Condition (充分条件):用来判断一个点是最优的。 在这里只介绍一个必要条件:
# FONC (First Order Necessary Condition, 一阶必要条件)f ∈ C 1 f \in C^1 f ∈ C 1 . 若 x ⋆ x^\star x ⋆ 是一个局部最小值,那么 f ′ ( x ⋆ ) = 0 f'(x^\star) = 0 f ′ ( x ⋆ ) = 0 . 即局部最优点必是驻点 。
Proof :
Claim: For any feasible direction d d d at x ⋆ x^\star x ⋆ , we have d ⊤ ∇ f ( x ⋆ ) = 0 d^\top \nabla f(x^\star) = 0 d ⊤ ∇ f ( x ⋆ ) = 0 .
Def x ( α ) : = x ⋆ + α d x(\alpha) := x^\star + \alpha d x ( α ) : = x ⋆ + α d , ϕ ( x ) = f ( x ( α ) ) \phi(x) = f(x(\alpha)) ϕ ( x ) = f ( x ( α ) ) .
Then use Taylor Theorem:
f ( x ⋆ + α d ) − f ( x ⋆ ) = ϕ ( α ) − ϕ ( 0 ) = ϕ ′ ( 0 ) α + o ( α ) = α d ⊤ f ( x ( α ) ) + o ( α ) f(x^\star + \alpha d) - f(x^\star) = \phi(\alpha) - \phi(0) = \phi'(0)\alpha + \mathcal o(\alpha) = \alpha d^\top f(x(\alpha)) + \mathcal o(\alpha) f ( x ⋆ + α d ) − f ( x ⋆ ) = ϕ ( α ) − ϕ ( 0 ) = ϕ ′ ( 0 ) α + o ( α ) = α d ⊤ f ( x ( α ) ) + o ( α )
因为 x ⋆ x^\star x ⋆ 是局部最小值,所以 f ( x ⋆ + α d ) ≥ f ( x ⋆ ) f(x^\star + \alpha d) \geq f(x^\star) f ( x ⋆ + α d ) ≥ f ( x ⋆ ) , 即 d ⊤ ∇ f ( x ⋆ ) ≥ 0 d^\top \nabla f(x^\star) \geq 0 d ⊤ ∇ f ( x ⋆ ) ≥ 0 . 同理,取 d d d 为 − d -d − d , 则 − d ⊤ ∇ f ( x ⋆ ) ≥ 0 -d^\top \nabla f(x^\star) \geq 0 − d ⊤ ∇ f ( x ⋆ ) ≥ 0 . 所以 d ⊤ ∇ f ( x ⋆ ) = 0 d^\top \nabla f(x^\star) = 0 d ⊤ ∇ f ( x ⋆ ) = 0 , ∀ d \forall d ∀ d . 故 ∇ f ( x ⋆ ) = 0 \nabla f(x^\star) = 0 ∇ f ( x ⋆ ) = 0 .
# SONC (Second Order Necessary Condition, 二阶必要条件)不讲了,直接讲充分条件。
# SOSC (Second Order Sufficient Condition, 二阶充分条件)f ∈ C 2 f \in C^2 f ∈ C 2 . 若 ∇ f ( x ⋆ ) = 0 \nabla f(x^\star) = 0 ∇ f ( x ⋆ ) = 0 且 F ( x ⋆ ) > 0 F(x^\star) > 0 F ( x ⋆ ) > 0 , 那么 x ⋆ x^\star x ⋆ 是严格局部最小值。
现在的研究只能找到严格的 局部最优充分条件,而不能找到局部最优的充分条件。
Proof :
采用泰勒定理。
f ( x ⋆ + α d ) − f ( x ⋆ ) = 1 2 d ⊤ F ( x ⋆ ) d + o ( ∥ α d ∥ ) \begin{aligned} f(x^\star + \alpha d) - f(x^\star) = \frac{1}{2}d^\top F(x^\star)d + \mathcal o(\|\alpha d\|) \end{aligned} f ( x ⋆ + α d ) − f ( x ⋆ ) = 2 1 d ⊤ F ( x ⋆ ) d + o ( ∥ α d ∥ )
Example:
f ( x ) = x 1 2 + x 2 2 f(x) = x_1^2 + x_2^2 f ( x ) = x 1 2 + x 2 2 .
∇ f ( x ) = [ 2 x 1 2 x 2 ] \nabla f(x) = \left[\begin{matrix} 2x_1 \\ 2x_2 \end{matrix}\right] ∇ f ( x ) = [ 2 x 1 2 x 2 ]
懒得抄了 ***
# 无约束优化算法# 最速下降法x k + 1 = x k + α k d k , α k ∈ R , d k ∈ R n x_{k+1} = x_k + \alpha_kd_k, \; \alpha_k \in \mathbb{R}, d_k \in \mathbb{R}^n x k + 1 = x k + α k d k , α k ∈ R , d k ∈ R n
f ( x k + 1 ) − f ( x k ) = α k ∇ f ( x k ) ⊤ + o ( ∥ α k d k ∥ ) f(x_{k+1}) - f(x_k) = \alpha_k\nabla f(x_k)^\top + \mathcal o(\|\alpha_kd_k\|) f ( x k + 1 ) − f ( x k ) = α k ∇ f ( x k ) ⊤ + o ( ∥ α k d k ∥ )
其中,d k = ∇ f ( x k ) ∥ ∇ f ( x k ) ∥ \displaystyle d_k = \frac{\nabla f(x_k)}{\|\nabla f(x_k)\|} d k = ∥ ∇ f ( x k ) ∥ ∇ f ( x k ) , α k \alpha_k α k 可以是固定的,也可以是变化的。若取α k = arg min α > 0 f ( x k − α f ( x k ) ) \alpha_k = \argmin_{\alpha>0}f(x_k-\alpha f(x_k)) α k = a r g m i n α > 0 f ( x k − α f ( x k ) ) . 即作一个一维优化问题,其中α \alpha α 是唯一未知变量。这种方法称为最速下降法 (Steepest decent method).
需要注意的是,最速下降法不同于梯度下降法。未经优化的梯度下降法中,步长 (学习率) α \alpha α 是一个实现确定的值。而此处的 α \alpha α 通过求 arg min f ( x k − α f ( x k ) ) \argmin f(x_k-\alpha f(x_k)) a r g m i n f ( x k − α f ( x k ) ) 得到。
# 最速下降法的性质性质 1 x k + 1 − x k x_{k+1}-x_k x k + 1 − x k orthogonal to x k − x k − 1 x_k-x_{k-1} x k − x k − 1 . 即最速下降法中的任意相邻两步相互正交。
Proof :
⟨ x k + 1 − x k , x k − x k − 1 ⟩ = α k α k − 1 ⟨ ∇ f ( x k ) , ∇ f ( x k − 1 ) ⟩ \langle x_{k+1}-x_k, x_k-x_{k-1}\rangle = \alpha_k\alpha_{k-1}\langle\nabla f(x_k), \nabla f(x_{k-1})\rangle ⟨ x k + 1 − x k , x k − x k − 1 ⟩ = α k α k − 1 ⟨ ∇ f ( x k ) , ∇ f ( x k − 1 ) ⟩
FONC:
ϕ ′ ( α k ) = 0 \phi'(\alpha_k) = 0 ϕ ′ ( α k ) = 0
So, ∇ f ( x − α ∇ f ( x k ) ) ⊤ ( − ∇ f ( x k ) ) = 0 \nabla f(x-\alpha\nabla f(x_k))^\top(-\nabla f(x_k)) = 0 ∇ f ( x − α ∇ f ( x k ) ) ⊤ ( − ∇ f ( x k ) ) = 0 , which means ∇ f ( x k + 1 ) ⊤ ∇ f ( x k ) = 0 \nabla f(x_{k+1})^\top\nabla f(x_k) = 0 ∇ f ( x k + 1 ) ⊤ ∇ f ( x k ) = 0 .
性质 2 If ∇ f ( x k ) ≠ 0 \nabla f(x_k) \neq 0 ∇ f ( x k ) = 0 , then f ( x k + 1 ) < f ( x k ) f(x_{k+1}) < f(x_k) f ( x k + 1 ) < f ( x k ) . 即函数值始终严格下降。
Proof :
x k + 1 = x k − α k ∇ f ( x k ) x_{k+1} = x_k - \alpha_k\nabla f(x_k) x k + 1 = x k − α k ∇ f ( x k ) , where α k = arg min f ( x k − α ∇ f ( x k ) ) \alpha_k = \argmin f(x_k-\alpha \nabla f(x_k)) α k = a r g m i n f ( x k − α ∇ f ( x k ) ) .
So ∀ α ≠ α k \forall \alpha \neq \alpha_k ∀ α = α k , ϕ k ( α ) = f ( x k − α ∇ f ( x k ) ) ≥ ϕ k ( α k ) \phi_k(\alpha) = f(x_k - \alpha \nabla f(x_k)) \geq \phi_k(\alpha_k) ϕ k ( α ) = f ( x k − α ∇ f ( x k ) ) ≥ ϕ k ( α k ) .
ϕ ′ ( 0 ) = − ( ∇ f ( x k − 0 ∇ f ( x k ) ) ) ⊤ ∇ f ( x k ) = − ∥ ∇ f ( x k ) ∥ 2 < 0 \phi'(0) = -\left(\nabla f(x_k - 0\nabla f(x_k))\right)^\top \nabla f(x_k) = -\|\nabla f(x_k)\|^2 < 0 ϕ ′ ( 0 ) = − ( ∇ f ( x k − 0 ∇ f ( x k ) ) ) ⊤ ∇ f ( x k ) = − ∥ ∇ f ( x k ) ∥ 2 < 0 .
So, ∃ ω < 0 \exists \omega < 0 ∃ ω < 0 , ∀ α ˉ ∈ ( 0 , ω ] \forall \bar{\alpha} \in (0, \omega] ∀ α ˉ ∈ ( 0 , ω ] ???????
# Conjugate Direction Method (共轭方向法)# 相关概念和性质首先给出两向量共轭 的定义。 p , q ∈ R n p,q \in \mathbb{R}^n p , q ∈ R n are Q Q Q -conjugate, if p ⊤ Q q = 0 p^\top Qq = 0 p ⊤ Q q = 0 . If Q = I n Q = I_n Q = I n , then Q Q Q -conjugate is orthogonal.
若 Q Q Q 是正定实对称矩阵。若 d 0 , d 1 , … , d k d^0, d^1, \dots, d^k d 0 , d 1 , … , d k 是非零向量且两两 Q Q Q - 共轭。那么这些向量线性独立。
# 性质对二次函数,共轭方向法只需要 n n n 步即可达到最优值。
# 改进k : = 0 k:=0 k : = 0 , initialize x 0 x_0 x 0 g 0 = ∇ f ( x 0 ) g_0 = \nabla f(x_0) g 0 = ∇ f ( x 0 ) . If x 0 = 0 x_0 = 0 x 0 = 0 , then stop ; else, d 0 = − g 0 d_0 = -g_0 d 0 = − g 0 .α k = − g k ⊤ d k d k ⊤ Q d k , x k + 1 = x k + α k d k \alpha_k = -\frac{g_k^\top d_k}{d_k ^\top Qd_k}, x_{k+1} = x_k + \alpha_k d_k α k = − d k ⊤ Q d k g k ⊤ d k , x k + 1 = x k + α k d k g k + 1 = ∇ f ( x k + 1 ) g_{k+1} = \nabla f(x_{k+1}) g k + 1 = ∇ f ( x k + 1 ) . If g k + 1 = 0 g_{k+1} = 0 g k + 1 = 0 , then STOP .\beta_k = -\frac{g_k^\top Q d_k} d k + 1 = − g k + 1 + β k d k d_{k+1} = -g_{k+1}+\beta_k d_k d k + 1 = − g k + 1 + β k d k k : = k + 1 k:=k+1 k : = k + 1 . Go to STEP 3 改进后的方法不需要实现给出共轭方向,而是采用类似 Schmidt 正交化的方式得到各共轭向量。证明上述改进的正确性,也只需要证明该方法给出的向量是共轭向量。
# 约束优化问题 (Constrained Optimization)# 拉格朗日定理 (FONC)x ⋆ x^\star x ⋆ is a
# 不等式限制条件min f ( x ) f : R n → R s . t . h ( x ) = 0 h : R n → R m , m ≤ n g ( x ) ≤ 0 g : R n → R p \begin{aligned} & \min && f(x) && f: \mathbb{R}^n \to \mathbb{R} \\ & s.t. && h(x) = 0 && h: \mathbb{R}^n \to \mathbb{R}^m, m \leq n \\ & && g(x)\leq 0 && g: \mathbb{R}^n \to \mathbb{R}^p \end{aligned} min s . t . f ( x ) h ( x ) = 0 g ( x ) ≤ 0 f : R n → R h : R n → R m , m ≤ n g : R n → R p
# 相关概念如果一个不等式约束 g i ( x ) ≤ 0 g_i(x) \leq 0 g i ( x ) ≤ 0 在 x ⋆ x^\star x ⋆ 处成立,那么称该条件是活跃的 (active), 否则称为不活跃的 (inactive).
若 x ⋆ x^\star x ⋆ 满足上述约束条件,即 h ( x ⋆ ) = 0 h(x^\star) = 0 h ( x ⋆ ) = 0 且 g ( x ⋆ ) ≤ 0 g(x^\star) \leq 0 g ( x ⋆ ) ≤ 0 . 令 J ( x ⋆ ) : = { i ∣ g i ( x ⋆ ) = 0 } J(x^\star) := \{i|g_i(x^\star)=0\} J ( x ⋆ ) : = { i ∣ g i ( x ⋆ ) = 0 } . 若满足 ∇ g i ( x ⋆ ) , ∇ h i ( x ⋆ ) \nabla g_i(x^\star), \nabla h_i(x^\star) ∇ g i ( x ⋆ ) , ∇ h i ( x ⋆ ) 线性独立,那么称 x ⋆ x^\star x ⋆ 是正则的 (regular).
# KKT 定理 (FONC)令 f , h , g ∈ C 1 f,h,g \in C^1 f , h , g ∈ C 1 , x ⋆ x^\star x ⋆ 是正则点,且是满足上述约束条件的局部极小值,那么存在 λ ⋆ ∈ R m \lambda^\star \in \mathbb{R}^m λ ⋆ ∈ R m 及 μ ⋆ ∈ R p \mu^\star \in \mathbb{R}^p μ ⋆ ∈ R p 满足
∂ L ∂ x i = 0 , ∀ i \frac{\partial L}{\partial x_i} = 0, \forall i ∂ x i ∂ L = 0 , ∀ i . 其中,L = f ( x ) + λ ⋆ h ( x ) + μ ⋆ g ( x ) L = f(x) + \lambda^\star h(x) + \mu^\star g(x) L = f ( x ) + λ ⋆ h ( x ) + μ ⋆ g ( x ) , 称为拉格朗日定常方程式 (Lagrange stationary equation).h ( x ⋆ ) = 0 , g ( x ⋆ ) ≤ 0 h(x^\star) = 0, g(x^\star) \leq 0 h ( x ⋆ ) = 0 , g ( x ⋆ ) ≤ 0 , 即约束条件μ ⋆ ≥ 0 \mu^\star \geq 0 μ ⋆ ≥ 0 , 即对偶可行性条件μ ⋆ ⊤ g ( x ⋆ ) = 0 {\mu^\star}^\top g(x^\star)=0 μ ⋆ ⊤ g ( x ⋆ ) = 0 , 即互补松紧性条件上述四条件合称 KKT 条件 .
求解上述条件方程组殊为不易,一般从等式条件入手采用分类讨论的方式降低变元个数,再进行求解。
# 动态规划“你们都学过,所以不讲了。”