矩阵学习笔记
参考教材 Matrix Analysis and Applied Linear Algebra
Ch1 线性方程
- 方程解的三种情况:唯一解、无解和无穷解
- Gauss 消元计算复杂度 $O(\dfrac{n^3}{3})$
- Gauss-Jordan 消元复杂度 $O(\dfrac{n^3}{2})$
- 浮点数表示:$f=\pm.d_1d_2\cdots d_t\times \beta^\epsilon$
- Partial Pivoting:选所在列下面模最大的系数,只交换行
- Complete Pivoting:选所在位置右下区域模最大的系数,同时交换行和列
- 病态系统:系统微小扰动导致解的巨变
Ch2 矩形系统和阶梯形式
-
行阶梯形式: \(\left( {\begin{array}{*{20}{c}} \mathbf{*}&*&*&*\\ 0&0&0&\mathbf{*} \\ 0&0&0&0 \end{array}} \right)\)
-
$rank(\mathbf{A})$ = number of pivots
-
约化行阶梯形式($\mathbf{E_A}$): \(\left( {\begin{array}{*{20}{c}} \mathbf{1}&*&0&*\\ 0&0&\mathbf{1}&* \\ 0&0&0&0 \end{array}} \right)\)
Ch3 矩阵代数
-
矩阵乘法不满足交换律
-
$\mathbf{(AB)^T=B^TA^T}$
-
Gauss-Jordan 消元法求矩阵的逆: $\mathbf{[A \vdots I] \to [I \vdots A^{-1}]}$ ,时间复杂度 $O(n^3)$
-
Sherman-Morrison-Woodburg 公式 \((A+CD^T)^{-1}=A^{-1}-A^{-1}C(I+D^TA^{-1}C)^{-1}D^{T}A^{-1}\)
-
Neumann 级数:如果 $\lim_{n\to \infty}A^n=0$,那么 \((I-A)^{-1}=I+A+A^2+\cdots=\sum^{\infty}_{k=0}A^k\)
-
病态(ill conditioned),条件数 \(k = ||A||\,||A^{-1}||\)
-
基础矩阵 \((1-uv^T)^{-1}=I-\dfrac{uv^T}{v^Tu-1}\)
-
LU 分解
Ch4 向量空间
Ch5 范数、内积和正交
Ch6 行列式
Ch7 特征值和特征向量
Ch8 Perron-Frobenius理论
矩阵求导
Numerator layout
\[\begin{align} \frac{dy}{d\mathbf{x}} &= \left[ \frac{\partial y}{\partial x_1} \frac{\partial y}{\partial x_2} \cdots \frac{\partial y}{\partial x_n} \right],\\ \frac{d\mathbf{y}}{dx} &= \begin{bmatrix} \frac{\partial y_1}{\partial x}\\ \frac{\partial y_2}{\partial x}\\ \vdots\\ \frac{\partial y_m}{\partial x}\\ \end{bmatrix},\\ \frac{d\mathbf{y}}{d\mathbf{x}} &= \begin{bmatrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} & \cdots & \frac{\partial y_1}{\partial x_n}\\ \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} & \cdots & \frac{\partial y_2}{\partial x_n}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y_m}{\partial x_1} & \frac{\partial y_m}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_n}\\ \end{bmatrix},\\ \frac{dy}{d\mathbf{X}} &= \begin{bmatrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \cdots & \frac{\partial y}{\partial x_{p1}}\\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \cdots & \frac{\partial y}{\partial x_{p2}}\\ \vdots & \vdots & \ddots & \vdots\\ \frac{\partial y}{\partial x_{1q}} & \frac{\partial y}{\partial x_{2q}} & \cdots & \frac{\partial y}{\partial x_{pq}}\\ \end{bmatrix}. \end{align}\]常用公式
\[\begin{align} dy&=\frac{dy}{dx}dx\\ dy&=\frac{dy}{d\mathbf{x}}d\mathbf{x}\\ dy&=\mathrm{tr}(\frac{dy}{d\mathbf{X}}d\mathbf{X})\\ d\mathbf{y}&=\frac{d\mathbf{y}}{dx}dx\\ d\mathbf{y}&=\frac{d\mathbf{y}}{d\mathbf{x}}d\mathbf{x}\\ d\mathbf{Y}&=\frac{d\mathbf{Y}}{dx}dx\\ d\mathbf{A}&=\mathbf{0}\\ d(a\mathbf{X})&=a~d\mathbf{X}\\ d(\mathbf{X+Y})&=d\mathbf{X}+d\mathbf{Y}\\ d({\mathbf{X}\mathbf{Y}})&=(d\mathbf{X})\mathbf{Y}+\mathbf{X}(d\mathbf{Y})\\ d({\mathbf{X_1}\mathbf{X_2}\cdots\mathbf{X_n}})&=(d\mathbf{X_1})\mathbf{X_2}\cdots\mathbf{X_n}+\mathbf{X_1}(d\mathbf{X_2})\cdots\mathbf{X_n}+\cdots+\mathbf{X_1}\mathbf{X_2}\cdots (d\mathbf{X_n})\\ d({\mathbf{A}\mathbf{X}}\mathbf{B}+\mathbf{C})&=\mathbf{A}(d\mathbf{X})\mathbf{B}\\ d({\mathbf{X}\otimes\mathbf{Y}})&=(d\mathbf{X})\otimes\mathbf{Y}+\mathbf{X}\otimes(d\mathbf{Y})\\ d({\mathbf{X}\circ\mathbf{Y}})&=(d\mathbf{X})\circ\mathbf{Y}+\mathbf{X}\circ(d\mathbf{Y})\\ d(\mathbf{X}^\top )&=(d\mathbf{X})^\top\\ d(\mathbf{X}^{-1})&=-\mathbf{X}^{-1}(d\mathbf{X})\mathbf{X}^{-1}\\ d(\mathrm{tr}(\mathbf{X}))&=\mathrm{tr}(d\mathbf{X})\\ d(|\mathbf{X}|)&=\mathrm{tr}(adj(\mathbf{X})d\mathbf{X})=|\mathbf{X}|\mathrm{tr}(\mathbf{X}^{-1}d\mathbf{X})\\ df(x,y,z)&=\frac{\partial f}{\partial x}dx+\frac{\partial f}{\partial y}dy+\frac{\partial f}{\partial z}dz\\ df(x,\mathbf{y},\mathbf{Z})&=\frac{\partial f}{\partial x}dx+\frac{\partial f}{\partial \mathbf{y}}d\mathbf{y}+\mathrm{tr}(\frac{\partial f}{\partial \mathbf{Z}}d\mathbf{Z})\\ d\mathbf{f}(x,\mathbf{y})&=\frac{\partial \mathbf{f}}{\partial x}dx+\frac{\partial \mathbf{f}}{\partial \mathbf{y}}d\mathbf{y}\\ d\mathbf{F}(x,y,z)&=\frac{\partial \mathbf{F}}{\partial x}dx+\frac{\partial \mathbf{F}}{\partial y}dy+\frac{\partial \mathbf{F}}{\partial z}dz \end{align}\]常用结论
\[\begin{align} \frac{d(\mathbf{x}^\top \mathbf{x})}{d\mathbf{x}}&=2\mathbf{x}^\top\\ \frac{d||\mathbf{W}\mathbf{x}+\mathbf{b}||_2^2}{d\mathbf{W}}&=2\mathbf{x}(\mathbf{W}\mathbf{x}+\mathbf{b})^\top\\ \frac{d(\ln|\mathbf{X}|)}{d\mathbf{X}}&=\mathbf{X}^{-1}\\ \frac{d(\mathrm{tr}(\mathbf{A}\mathbf{X}\mathbf{B}))}{d\mathbf{X}}&=\mathbf{B}\mathbf{A}\\ \end{align}\]注:以上公式参考 Matrix-Calculus