对于单变量线性函数
对于多元标量函数$f(x)=a^Tx$,$\frac{\partial{f}}{\partial{x}}=a$.
Proposition: 由于标量的值等于它的迹。对于$g(x)=Tr[a^Tx]$,
Definition of Trace:
-
$Tr[A+B]=Tr[A]+Tr[B]$ $Tr[cA]=cTr[A]$ $Tr[A]=Tr[A^T]$ -
$Tr[A_1 A_2 \cdots A_n]=Tr[A_n A_1 \cdots A_{n-1}]$ $Tr[A^T B]=\sum_i\sum_jA_{ij}B_{ij}$
对于多元标量函数$f(x)=Tr[A^{T}x]$,有$\frac{\partial{f}}{\partial{x}}=A \quad \forall A,x\in \mathbb{R}^{m\times n}$。
利用Properties 1(5)易证。
对于任意的标量函数$f:\mathbb{R}^{m\times n}\rightarrow \mathbb{R}$,有$\frac{\partial{f}}{\partial{x}}=\frac{\partial{Tr[x]}}{\partial{x}}$。然后可以利用Properties 1中的性质,进行变换,化成Theorem 2的形式。
定义矩阵微分(differential)如下: $$ d A= \left[\begin{array}{ccc} d A_{11}& \cdots & d A_{1n}\ \vdots & \ddots & \vdots \ d A_{m1}& \cdots & d A_{mn}\ \end{array}\right] $$
根据迹的定义和Definition 2可得: $$ d Tr[A] = Tr[ d A] $$
对于标量函数$f:\mathbb{R}^{m\times n}\rightarrow \mathbb{R},\ df=Tr[(\frac{\partial{f}}{\partial{x}})^T d x]$. (建立了矩阵微分和矩阵求导之间的关系)
证明:$LHS= d f=\sum_{ij}\frac{\partial{f}}{\partial{x_{ij}}} d x_{ij}$
依次利用Properties 1、Definition 2、Definition 1:$RHS = \sum_{ij}(\frac{\partial{f}}{\partial{x}}){ij}( d x){ij}=\sum_{ij}(\frac{\partial{f}}{\partial{x}}){ij} d x{ij}=\sum_{ij}\frac{\partial{f}}{\partial{x_{ij}}} d x_{ij}=LHS$
利用Definition 2,可以得到矩阵微分的性质:
$d(cA)=cdA$ $d(A+B)=dA + dB$ $d(AB)=dAB+AdB$
自此,对于标量函数$f:\mathbb{R}^{m\times n}\rightarrow \mathbb{R}$,我们能够通过以下流程轻易对其求导:
- i.e.,
$df=dTr[x]=Tr[dx]$ - 利用迹的性质Properties 1对$df$进行化简,化简成$Tr[A^T dx]$形式的线性相加.
- 利用Theorem 4,得到$\frac{\partial{f}}{\partial{x}}$.
-
$f(x)=x^TAx, A\in\mathbb{R}^{n \times n},x\in \mathbb{R}^n$
Hence, $$ \frac{\partial{f}}{\partial{x}}=(x^TA^T+x^TA)^T=(A+A^T)x $$
$XX^{-1}=I\to d (XX^{-1})= d I \to d XX^{-1}+X d X^{-1} \to d X^{-1}=-X^{-1} d XX^{-1}$
上面总结了scalar函数对x的求导。下面定义vector函数对向量x的求导:
对于vector函数$f=[f_1,f_2,\dots,f_n]^T$,$f_i=f_i(x), x=[x_1,x_2,\dots,x_m]^T$,我们定义: $$ \frac{\partial{f}}{\partial{x}}= \left[\begin{array}{cccc} \frac{\partial f_1}{\partial x_{1}}& \frac{\partial f_2}{\partial x_{1}} & \cdots & \frac{\partial f_n}{\partial x_{1}}\ \frac{\partial f_1}{\partial x_{2}} & \frac{\partial f_2}{\partial x_{2}} & \cdots & \frac{\partial f_n}{\partial x_{2}} \ \vdots & \vdots & \ddots & \vdots \ \frac{\partial f_1}{\partial x_{m}}& \frac{\partial f_2}{\partial x_{m}}& \cdots & \frac{\partial f_n}{\partial x_{m}}\ \end{array}\right] $$
假设$f:\mathbb{R}^m \to \mathbb{R}^n$,$f$的Jacobian Matrix为 $$ J(f)=[\frac{\partial{f}}{\partial{x_1}} \cdots\frac{\partial{f}}{\partial{x_m}}]= \left[\begin{array}{ccc} \frac{\partial f_1}{\partial x_{1}}& \cdots & \frac{\partial f_1}{\partial x_{m}}\ \vdots & \ddots & \vdots \ \frac{\partial f_n}{\partial x_{1}}& \cdots & \frac{\partial f_n}{\partial x_{m}}\ \end{array}\right] $$ 假设$f:\mathbb{R}^m \to \mathbb{R}$,$f$的Hessian Matrix为 $$ H(f)={\begin{bmatrix}{\frac {\partial ^{2}f}{\partial x_{1}^{2}}}&{\frac {\partial ^{2}f}{\partial x_{1},\partial x_{2}}}&\cdots &{\frac {\partial ^{2}f}{\partial x_{1},\partial x_{m}}}\\{\frac {\partial ^{2}f}{\partial x_{2},\partial x_{1}}}&{\frac {\partial ^{2}f}{\partial x_{2}^{2}}}&\cdots &{\frac {\partial ^{2}f}{\partial x_{2},\partial x_{m}}}\\\vdots &\vdots &\ddots &\vdots \\{\frac {\partial ^{2}f}{\partial x_{m},\partial x_{1}}}&{\frac {\partial ^{2}f}{\partial x_{m},\partial x_{2}}}&\cdots &{\frac {\partial ^{2}f}{\partial x_{m}^{2}}}\end{bmatrix}}=J(\nabla f) $$ 根据Definition 3,我们可以把Hessian Matrix和Jacobian Matrix重写为: $$ \begin{array}{c} J(f)=(\frac{\partial{f}}{\partial{x}})^T\ H(f)=\frac{\partial}{\partial{x}}(\frac{\partial{f}}{\partial{x}}) \end{array} $$
对于$f:\mathbb{R}^{m}\to \mathbb{R}^{n}\ df=(\frac{\partial{f}}{\partial{x}})^T dx$.
证明$df_j=((\frac{\partial{f}}{\partial{x}})^T dx)_j \quad \forall j$.
对于vector函数的求导,整体流程同Conclusion 1,除了不能随便用trace。