From 7deb2d812b428fffdf7b635e7363f7f3976eae13 Mon Sep 17 00:00:00 2001 From: Cloga Chen Date: Tue, 29 Dec 2015 11:15:11 +0800 Subject: [PATCH] 2.7 --- ...\344\274\230\350\247\243-checkpoint.ipynb" | 826 +++++++++++++++++- ...\346\225\260\345\255\246-checkpoint.ipynb" | 51 +- ...346\234\200\344\274\230\350\247\243.ipynb" | 712 ++++++++++++++- ...345\217\267\346\225\260\345\255\246.ipynb" | 51 +- 4 files changed, 1531 insertions(+), 109 deletions(-) diff --git "a/.ipynb_checkpoints/2.7. \346\225\260\345\255\246\344\274\230\345\214\226\357\274\232\346\211\276\345\210\260\345\207\275\346\225\260\347\232\204\346\234\200\344\274\230\350\247\243-checkpoint.ipynb" "b/.ipynb_checkpoints/2.7. \346\225\260\345\255\246\344\274\230\345\214\226\357\274\232\346\211\276\345\210\260\345\207\275\346\225\260\347\232\204\346\234\200\344\274\230\350\247\243-checkpoint.ipynb" index cc75b2c..1269f05 100644 --- "a/.ipynb_checkpoints/2.7. \346\225\260\345\255\246\344\274\230\345\214\226\357\274\232\346\211\276\345\210\260\345\207\275\346\225\260\347\232\204\346\234\200\344\274\230\350\247\243-checkpoint.ipynb" +++ "b/.ipynb_checkpoints/2.7. \346\225\260\345\255\246\344\274\230\345\214\226\357\274\232\346\211\276\345\210\260\345\207\275\346\225\260\347\232\204\346\234\200\344\274\230\350\247\243-checkpoint.ipynb" @@ -143,7 +143,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": 4, "metadata": { "collapsed": false }, @@ -154,7 +154,7 @@ "0.6999999997839409" ] }, - "execution_count": 3, + "execution_count": 4, "metadata": {}, "output_type": "execute_result" } @@ -266,71 +266,835 @@ "\n", "上面的梯度下降算法是玩具不会被用于真实的问题。\n", "\n", - "正如从上面例子中看到的,简单梯度下降算法的一个问题是,它试着摇摆穿越峡谷,每次跟随梯度的方法,以便穿越峡谷。共轭梯度通过添加*摩擦力*项来解决这个问题: each step depends on the two last values of the gradient and sharp turns are reduced.\n" + "正如从上面例子中看到的,简单梯度下降算法的一个问题是,它试着摇摆穿越峡谷,每次跟随梯度的方法,以便穿越峡谷。共轭梯度通过添加*摩擦力*项来解决这个问题: 每一步依赖于前两个值的梯度然后急转弯减少了。\n", + "\n", + "**共轭梯度下降**\n", + "\n", + "状况糟糕的非二元函数。\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_6.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_106.png)\n", + "\n", + "状况糟糕的极端非二元函数。\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_7.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_107.png)\n", + "\n", + "在scipy中基于共轭梯度下降方法名称带有‘cg’。最小化函数的简单共轭梯度下降方法是[scipy.optimize.fmin_cg()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_cg.html#scipy.optimize.fmin_cg):\n" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully.\n", + " Current function value: 0.000000\n", + " Iterations: 13\n", + " Function evaluations: 120\n", + " Gradient evaluations: 30\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ 0.99998968, 0.99997855])" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x): # The rosenbrock函数\n", + " return .5*(1 - x[0])**2 + (x[1] - x[0]**2)**2\n", + "optimize.fmin_cg(f, [2, 2]) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "这些方法需要函数的梯度。方法可以计算梯度,但是如果传递了梯度性能将更好:" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 6, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully.\n", + " Current function value: 0.000000\n", + " Iterations: 13\n", + " Function evaluations: 30\n", + " Gradient evaluations: 30\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ 0.99999199, 0.99998336])" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def fprime(x):\n", + " return np.array((-2*.5*(1 - x[0]) - 4*x[0]*(x[1] - x[0]**2), 2*(x[1] - x[0]**2)))\n", + "optimize.fmin_cg(f, [2, 2], fprime=fprime) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "注意函数只会评估30次,相对的没有梯度是120次。\n", + "\n", + "### 2.7.2.3 牛顿和拟牛顿法\n", + "\n", + "#### 2.7.2.3.1 牛顿法: 使用Hessian (二阶微分)\n", + "\n", + "[牛顿法](http://en.wikipedia.org/wiki/Newton%27s_method_in_optimization)使用局部二元近似来计算跳跃的方向。为了这个目的,他们依赖于函数的前两个导数*梯度*和[Hessian](http://en.wikipedia.org/wiki/Hessian_matrix)。\n", + "\n", + "**状况糟糕的二元函数:**\n", + "\n", + "注意,因为二元近似是精确的,牛顿法是非常快的。\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_8.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_108.png)\n", + "\n", + "**状况糟糕的非二元函数:**\n", + "\n", + "这里我们最优化高斯分布,通常在它的二元近似的下面。因此,牛顿法超调量并且导致震荡。\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_9.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_109.png)\n", + "\n", + "**状况糟糕的极端非二元函数:**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_10.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_110.png)\n", + "\n", + "在scipy中, 最优化的牛顿法在[scipy.optimize.fmin_ncg()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_ncg.html#scipy.optimize.fmin_ncg)实现 (cg这里是指一个内部操作的事实,Hessian翻转, 使用共轭梯度来进行)。[scipy.optimize.fmin_tnc()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_tnc.html#scipy.optimize.fmin_tnc) 可以被用于限制问题,尽管没有那么多用途:" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully.\n", + " Current function value: 0.000000\n", + " Iterations: 9\n", + " Function evaluations: 11\n", + " Gradient evaluations: 51\n", + " Hessian evaluations: 0\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ 1., 1.])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x): # rosenbrock函数\n", + " return .5*(1 - x[0])**2 + (x[1] - x[0]**2)**2\n", + "def fprime(x):\n", + " return np.array((-2*.5*(1 - x[0]) - 4*x[0]*(x[1] - x[0]**2), 2*(x[1] - x[0]**2)))\n", + "optimize.fmin_ncg(f, [2, 2], fprime=fprime) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "注意与共轭梯度(上面的)相比,牛顿法需要较少的函数评估,更多的梯度评估,因为它使用它近似Hessian。让我们计算Hessian并将它传给算法:" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully.\n", + " Current function value: 0.000000\n", + " Iterations: 9\n", + " Function evaluations: 11\n", + " Gradient evaluations: 19\n", + " Hessian evaluations: 9\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ 1., 1.])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def hessian(x): # Computed with sympy\n", + " return np.array(((1 - 4*x[1] + 12*x[0]**2, -4*x[0]), (-4*x[0], 2)))\n", + "optimize.fmin_ncg(f, [2, 2], fprime=fprime, fhess=hessian) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> **注意**:在超高维,Hessian的翻转代价高昂并且不稳定 (大规模 > 250)。\n", + "\n", + "> **注意**:牛顿最优化算法不应该与基于相同原理的牛顿根发现法相混淆,[scipy.optimize.newton()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.newton.html#scipy.optimize.newton)。\n", + "\n", + "#### 2.7.2.3.2 拟牛顿方法: 进行着近似Hessian\n", + "\n", + "**BFGS**: BFGS (Broyden-Fletcher-Goldfarb-Shanno算法) 改进了每一步对Hessian的近似。\n", + "\n", + "**状况糟糕的二元函数:**\n", + "\n", + "在准确的二元函数中, BFGS并不像牛顿法那么快,但是还是很快。\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_11.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_111.png)\n", + "\n", + "**状况糟糕的非二元函数:**\n", + "\n", + "这种情况下BFGS比牛顿好, 因为它的曲度经验估计比Hessian给出的好。\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_12.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_112.png)\n", + "\n", + "**状况糟糕的极端非二元函数:**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_13.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_113.png)" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully.\n", + " Current function value: 0.000000\n", + " Iterations: 16\n", + " Function evaluations: 24\n", + " Gradient evaluations: 24\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ 1.00000017, 1.00000026])" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x): # rosenbrock函数\n", + " return .5*(1 - x[0])**2 + (x[1] - x[0]**2)**2\n", + "def fprime(x):\n", + " return np.array((-2*.5*(1 - x[0]) - 4*x[0]*(x[1] - x[0]**2), 2*(x[1] - x[0]**2)))\n", + "optimize.fmin_bfgs(f, [2, 2], fprime=fprime)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**L-BFGS**: 限制内存的BFGS介于BFGS和共轭梯度之间: 在非常高的维度 (> 250) 计算和翻转的Hessian矩阵的成本非常高。L-BFGS保留了低秩的版本。此外,scipy版本, [scipy.optimize.fmin_l_bfgs_b()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html#scipy.optimize.fmin_l_bfgs_b), 包含箱边界:" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "(array([ 1.00000005, 1.00000009]),\n", + " 1.4417677473011859e-15,\n", + " {'funcalls': 17,\n", + " 'grad': array([ 1.02331202e-07, -2.59299369e-08]),\n", + " 'nit': 16,\n", + " 'task': 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL',\n", + " 'warnflag': 0})" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x): # rosenbrock函数\n", + " return .5*(1 - x[0])**2 + (x[1] - x[0]**2)**2\n", + "def fprime(x):\n", + " return np.array((-2*.5*(1 - x[0]) - 4*x[0]*(x[1] - x[0]**2), 2*(x[1] - x[0]**2)))\n", + "optimize.fmin_l_bfgs_b(f, [2, 2], fprime=fprime) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> **注意**:如果你不为L-BFGS求解器制定梯度,你需要添加approx_grad=1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.7.2.4 较少梯度方法\n", + "\n", + "#### 2.7.2.4.1 打靶法: Powell算法\n", + "\n", + "接近梯度方法\n", + "\n", + "**状态糟糕的二元函数:**\n", + "\n", + "Powell法对低维局部糟糕状况并不很敏感\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_14.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_114.png)\n", + "\n", + "**状况糟糕的极端非二元函数:**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_16.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_116.png)\n", + "\n", + "#### 2.7.2.4.2 单纯形法: Nelder-Mead\n", + "\n", + "Nelder-Mead算法是对高维空间的对立方法的归纳。这个算法通过改进[单纯形](http://en.wikipedia.org/wiki/Simplex)来工作,高维空间间隔和三角形的归纳,包裹最小值。\n", + "\n", + "**长处**: 对噪音很强壮,他不依赖于计算梯度。因此,它可以在局部光滑的函数上工作,比如实验数据点,只要他显示了一个大规模的钟形行为。但是,它在光滑、非噪音函数上比基于梯度的方法慢。\n", + "\n", + "**状况糟糕的非二元函数:**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_17.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_117.png)\n", + "\n", + "**状况糟糕的极端非二元函数:**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_18.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_118.png)\n", + "\n", + "在scipy中, [scipy.optimize.fmin()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html#scipy.optimize.fmin) 实现了Nelder-Mead法:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully.\n", + " Current function value: 0.000000\n", + " Iterations: 46\n", + " Function evaluations: 91\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ 0.99998568, 0.99996682])" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x): # rosenbrock函数\n", + " return .5*(1 - x[0])**2 + (x[1] - x[0]**2)**2\n", + "optimize.fmin(f, [2, 2])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.7.2.5 全局最优化算法\n", + "\n", + "如果你的问题不允许惟一的局部最低点(很难测试除非是凸函数),如果你没有先前知识来让优化起点接近答案,你可能需要全局最优化算法。\n", + "\n", + "#### 2.7.2.5.1 暴力: 网格搜索\n", + "\n", + "[scipy.optimize.brute()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.brute.html#scipy.optimize.brute)在 函数网格内来评价函数,根据最小值返回参数。参数由[numpy.mgrid](http://docs.scipy.org/doc/numpy/reference/generated/numpy.mgrid.html#numpy.mgrid)给出的范围来指定。默认情况下,每个方向进行20步:" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1.00001462, 1.00001547])" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x): # rosenbrock函数\n", + " return .5*(1 - x[0])**2 + (x[1] - x[0]**2)**2\n", + "optimize.brute(f, ((-1, 2), (-1, 2)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2.7.3 使用scipy优化的现实指南\n", + "\n", + "### 2.7.3.1 选择一个方法\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_compare_optimizers_1.png)\n", + "\n", + "---\n", + "\n", + "**没有关于梯度的知识:**\n", + " \t\n", + "- 一般来说,倾向于BFGS ([scipy.optimize.fmin_bfgs()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_bfgs.html#scipy.optimize.fmin_bfgs)) 或 L-BFGS ([scipy.optimize.fmin_l_bfgs_b()]()), 即使你有大概的数值梯度\n", + "- 在状况良好的问题上,Powell ([scipy.optimize.fmin_powell()]()) 以及 Nelder-Mead ([scipy.optimize.fmin()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html#scipy.optimize.fmin)), 都是在高维上效果良好的梯度自有的方法,但是 ,他们无法支持状况糟糕的问题。\n", + "\n", + "---\n", + "\n", + "---\n", + "\n", + "**有关于梯度的知识:**\n", + " \t\n", + "- BFGS ([scipy.optimize.fmin_bfgs()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_bfgs.html#scipy.optimize.fmin_bfgs)) 或 L-BFGS ([scipy.optimize.fmin_l_bfgs_b()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html#scipy.optimize.fmin_l_bfgs_b))。\n", + "- BFGS的计算开支要大于L-BFGS, 它自身也比共轭梯度法开销大。另一方面,BFGS通常比CG(共轭梯度法)需要更少函数评估。因此,共轭梯度法在优化计算量较少的函数时比BFGS更好。\n", + "\n", + "---\n", + "\n", + "---\n", + "**带有Hessian**:\n", + "\n", + "- 如果你可以计算Hessian, 推荐牛顿法 ([scipy.optimize.fmin_ncg()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_ncg.html#scipy.optimize.fmin_ncg))。\n", + "\n", + "**如果有噪音测量**:\n", + " \t\n", + "使用Nelder-Mead ([scipy.optimize.fmin()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html#scipy.optimize.fmin)) 或者 Powell ([scipy.optimize.fmin_powell()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_powell.html#scipy.optimize.fmin_powell))。\n", + "\n", + "### 2.7.3.2 让优化器更快\n", + "\n", + "- 选择正确的方法 (见上面), 如果可以的话,计算梯度和Hessia。\n", + "- 可能的时候使用[preconditionning](http://en.wikipedia.org/wiki/Preconditioner)。\n", + "- 聪明的选择你的起点。例如,如果你正在运行许多相似的优化,那么在其他结果上软启动。\n", + "- 如果你不需要准确,那么请放松并容忍\n", + "\n", + "### 2.7.3.3 计算梯度\n", + "\n", + "计算梯度甚至是Hessians的努力, 是枯燥的但是也是值得的。使用[Sympy](http://www.scipy-lectures.org/packages/sympy.html#sympy)来进行象征计算将非常方便。\n", + "\n", + "优化不能很好收敛的一个来源是计算梯度过程的人为错误。你可以用[scipy.optimize.check_grad()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.check_grad.html#scipy.optimize.check_grad)来检查一下梯度是否正确。它返回给出的梯度与计算的梯度之间差异的基准:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "2.384185791015625e-07" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "optimize.check_grad(f, fprime, [2, 2])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "也看一下[scipy.optimize.approx_fprime()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.approx_fprime.html#scipy.optimize.approx_fprime)找一下你的错误。\n", + "#### 2.7.3.4 合成练习\n", + "\n", + "**练习: 简单的 (?) 二次函数**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_exercise_ill_conditioned_1.png)\n", + "\n", + "用K[0]作为起始点优化下列函数:" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": { - "collapsed": true + "collapsed": false }, "outputs": [], - "source": [] + "source": [ + "np.random.seed(0)\n", + "K = np.random.normal(size=(100, 100))\n", + "\n", + "def f(x):\n", + " return np.sum((np.dot(K, x - 1))**2) + np.sum(x**2)**2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "计时你的方法。找到最快的方法。为什么BFGS不好用了?\n", + "\n", + "**练习:局部扁平最小化**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_exercise_flat_minimum_0.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_exercise_flat_minimum_1.png)\n", + "\n", + "考虑一下函数$exp(-1/(.1*x^2 + y^2)$。这个函数在(0,0)存在一个最小值。从起点(1,1)开始,试着在$1e-8$达到这个最低点。\n", + "\n", + "## 2.7.4 特殊案例: 非线性最小二乘\n", + "\n", + "### 2.7.4.1 最小化向量函数的基准\n", + "\n", + "最小二乘法,向量函数基准值的最小化,有特定的结构可以用在[scipy.optimize.leastsq()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html#scipy.optimize.leastsq)中实现的[Levenberg–Marquardt 算法](https://en.wikipedia.org/wiki/Levenberg-Marquardt_algorithm)。\n", + "\n", + "让我们试一下最小化下面向量函数的基准:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([ 0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444,\n", + " 0.55555556, 0.66666667, 0.77777778, 0.88888889, 1. ]), 2)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x):\n", + " return np.arctan(x) - np.arctan(np.linspace(0, 1, len(x)))\n", + "x0 = np.zeros(10)\n", + "optimize.leastsq(f, x0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "这用了67次函数评估(用'full_output=1'试一下)。如果我们自己计算基准并且使用一个更好的通用优化器(BFGS)会怎么样:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully.\n", + " Current function value: 0.000000\n", + " Iterations: 11\n", + " Function evaluations: 144\n", + " Gradient evaluations: 12\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ -7.44987291e-09, 1.11112265e-01, 2.22219893e-01,\n", + " 3.33331914e-01, 4.44449794e-01, 5.55560493e-01,\n", + " 6.66672149e-01, 7.77779758e-01, 8.88882036e-01,\n", + " 1.00001026e+00])" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def g(x):\n", + " return np.sum(f(x)**2)\n", + "optimize.fmin_bfgs(g, x0) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "BFGS需要更多的函数调用,并且给出了一个并不精确的结果。\n", + "\n", + "注意只有当输出向量的维度非常大,比需要优化的函数还要大,`leastsq`与BFGS相类比才是有趣的。\n", + "\n", + "如果函数是线性的,这是一个线性代数问题,应该用[scipy.linalg.lstsq()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lstsq.html#scipy.linalg.lstsq)解决。\n", + "\n", + "### 2.7.4.2 曲线拟合\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_curve_fit_1.png)\n", + "\n", + "最小二乘问题通常出现在拟合数据的非线性拟合时。当我们自己构建优化问题时,scipy提供了这种目的的一个帮助函数: [scipy.optimize.curve_fit()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html#scipy.optimize.curve_fit):" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([ 1.50600889, 0.98754323]), array([[ 0.00030286, -0.00045233],\n", + " [-0.00045233, 0.00098838]]))" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(t, omega, phi):\n", + " return np.cos(omega * t + phi)\n", + "x = np.linspace(0, 3, 50)\n", + "y = f(x, 1.5, 1) + .1*np.random.normal(size=50)\n", + "optimize.curve_fit(f, x, y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**练习**\n", + "\n", + "用omega = 3来进行相同的练习。困难是什么?\n", + "\n", + "## 2.7.5 有限制条件的优化\n", + "\n", + "### 2.7.5.1 箱边界\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_constraints_2.png)\n", + "\n", + "箱边界是指限制优化的每个函数。注意一些最初不是写成箱边界的问题可以通过改变变量重写。\n", + "\n", + "- [scipy.optimize.fminbound()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fminbound.html#scipy.optimize.fminbound)进行一维优化\n", + "- [scipy.optimize.fmin_l_bfgs_b()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html#scipy.optimize.fmin_l_bfgs_b)带有边界限制的quasi-Newton方法:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([ 1.5, 1.5]),\n", + " 1.5811388300841898,\n", + " {'funcalls': 12,\n", + " 'grad': array([-0.94868331, -0.31622778]),\n", + " 'nit': 2,\n", + " 'task': 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL',\n", + " 'warnflag': 0})" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x):\n", + " return np.sqrt((x[0] - 3)**2 + (x[1] - 2)**2)\n", + "optimize.fmin_l_bfgs_b(f, np.array([0, 0]), approx_grad=1, bounds=((-1.5, 1.5), (-1.5, 1.5))) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.7.5.2 通用限制\n", + "\n", + "相等和不相等限制特定函数: f(x) = 0 and g(x)< 0。\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_non_bounds_constraints_1.png)\n", + "\n", + "- [scipy.optimize.fmin_slsqp()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_slsqp.html#scipy.optimize.fmin_slsqp) 序列最小二乘程序: 相等和不相等限制:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully. (Exit mode 0)\n", + " Current function value: 2.47487373504\n", + " Iterations: 5\n", + " Function evaluations: 20\n", + " Gradient evaluations: 5\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ 1.25004696, 0.24995304])" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x):\n", + " return np.sqrt((x[0] - 3)**2 + (x[1] - 2)**2)\n", + "\n", + "def constraint(x):\n", + " return np.atleast_1d(1.5 - np.sum(np.abs(x)))\n", + "\n", + "optimize.fmin_slsqp(f, np.array([0, 0]), ieqcons=[constraint, ])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- [scipy.optimize.fmin_cobyla()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_cobyla.html#scipy.optimize.fmin_cobyla)通过线性估计的限定优化:只有不相等限制:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1.25009622, 0.24990378])" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "optimize.fmin_cobyla(f, np.array([0, 0]), cons=constraint)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "上面这个问题在统计中被称为[Lasso](http://en.wikipedia.org/wiki/Lasso_(statistics)#LASSO_method)问题, 有许多解决它的高效方法 (比如在[scikit-learn](http://scikit-learn.org/)中)。一般来说,当特定求解器存在时不需要使用通用求解器。\n", + "\n", + "**拉格朗日乘子法**\n", + "\n", + "如果你有足够的数学知识,许多限定优化问题可以被转化为非限定性优化问题,使用被称为拉格朗日乘子法的数学技巧。" + ] } ], "metadata": { @@ -349,7 +1113,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", - "version": "2.7.10" + "version": "2.7.11" } }, "nbformat": 4, diff --git "a/.ipynb_checkpoints/3.2. Sympy\357\274\232Python\344\270\255\347\232\204\347\254\246\345\217\267\346\225\260\345\255\246-checkpoint.ipynb" "b/.ipynb_checkpoints/3.2. Sympy\357\274\232Python\344\270\255\347\232\204\347\254\246\345\217\267\346\225\260\345\255\246-checkpoint.ipynb" index bf7e8df..d1674c0 100644 --- "a/.ipynb_checkpoints/3.2. Sympy\357\274\232Python\344\270\255\347\232\204\347\254\246\345\217\267\346\225\260\345\255\246-checkpoint.ipynb" +++ "b/.ipynb_checkpoints/3.2. Sympy\357\274\232Python\344\270\255\347\232\204\347\254\246\345\217\267\346\225\260\345\255\246-checkpoint.ipynb" @@ -1,23 +1,34 @@ { - "metadata": { - "name": "", - "signature": "sha256:4241644a06829855e5a79a3650ab26bc658c3eae01016cde5ae1f1fc3eb9aada" - }, - "nbformat": 3, - "nbformat_minor": 0, - "worksheets": [ + "cells": [ { - "cells": [ - { - "cell_type": "code", - "collapsed": false, - "input": [], - "language": "python", - "metadata": {}, - "outputs": [] - } - ], - "metadata": {} + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [] } - ] -} \ No newline at end of file + ], + "metadata": { + "kernelspec": { + "display_name": "Python 2", + "language": "python", + "name": "python2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.11" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git "a/2.7. \346\225\260\345\255\246\344\274\230\345\214\226\357\274\232\346\211\276\345\210\260\345\207\275\346\225\260\347\232\204\346\234\200\344\274\230\350\247\243.ipynb" "b/2.7. \346\225\260\345\255\246\344\274\230\345\214\226\357\274\232\346\211\276\345\210\260\345\207\275\346\225\260\347\232\204\346\234\200\344\274\230\350\247\243.ipynb" index 4b15e23..1269f05 100644 --- "a/2.7. \346\225\260\345\255\246\344\274\230\345\214\226\357\274\232\346\211\276\345\210\260\345\207\275\346\225\260\347\232\204\346\234\200\344\274\230\350\247\243.ipynb" +++ "b/2.7. \346\225\260\345\255\246\344\274\230\345\214\226\357\274\232\346\211\276\345\210\260\345\207\275\346\225\260\347\232\204\346\234\200\344\274\230\350\247\243.ipynb" @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "code", - "execution_count": 3, + "execution_count": 2, "metadata": { "collapsed": false }, @@ -376,89 +376,725 @@ "\n", "**状况糟糕的二元函数:**\n", "\n", - "Note that, as the quadratic approximation is exact, the Newton method is blazing fast" + "注意,因为二元近似是精确的,牛顿法是非常快的。\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_8.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_108.png)\n", + "\n", + "**状况糟糕的非二元函数:**\n", + "\n", + "这里我们最优化高斯分布,通常在它的二元近似的下面。因此,牛顿法超调量并且导致震荡。\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_9.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_109.png)\n", + "\n", + "**状况糟糕的极端非二元函数:**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_10.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_110.png)\n", + "\n", + "在scipy中, 最优化的牛顿法在[scipy.optimize.fmin_ncg()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_ncg.html#scipy.optimize.fmin_ncg)实现 (cg这里是指一个内部操作的事实,Hessian翻转, 使用共轭梯度来进行)。[scipy.optimize.fmin_tnc()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_tnc.html#scipy.optimize.fmin_tnc) 可以被用于限制问题,尽管没有那么多用途:" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully.\n", + " Current function value: 0.000000\n", + " Iterations: 9\n", + " Function evaluations: 11\n", + " Gradient evaluations: 51\n", + " Hessian evaluations: 0\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ 1., 1.])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x): # rosenbrock函数\n", + " return .5*(1 - x[0])**2 + (x[1] - x[0]**2)**2\n", + "def fprime(x):\n", + " return np.array((-2*.5*(1 - x[0]) - 4*x[0]*(x[1] - x[0]**2), 2*(x[1] - x[0]**2)))\n", + "optimize.fmin_ncg(f, [2, 2], fprime=fprime) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "注意与共轭梯度(上面的)相比,牛顿法需要较少的函数评估,更多的梯度评估,因为它使用它近似Hessian。让我们计算Hessian并将它传给算法:" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 7, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully.\n", + " Current function value: 0.000000\n", + " Iterations: 9\n", + " Function evaluations: 11\n", + " Gradient evaluations: 19\n", + " Hessian evaluations: 9\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ 1., 1.])" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def hessian(x): # Computed with sympy\n", + " return np.array(((1 - 4*x[1] + 12*x[0]**2, -4*x[0]), (-4*x[0], 2)))\n", + "optimize.fmin_ncg(f, [2, 2], fprime=fprime, fhess=hessian) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> **注意**:在超高维,Hessian的翻转代价高昂并且不稳定 (大规模 > 250)。\n", + "\n", + "> **注意**:牛顿最优化算法不应该与基于相同原理的牛顿根发现法相混淆,[scipy.optimize.newton()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.newton.html#scipy.optimize.newton)。\n", + "\n", + "#### 2.7.2.3.2 拟牛顿方法: 进行着近似Hessian\n", + "\n", + "**BFGS**: BFGS (Broyden-Fletcher-Goldfarb-Shanno算法) 改进了每一步对Hessian的近似。\n", + "\n", + "**状况糟糕的二元函数:**\n", + "\n", + "在准确的二元函数中, BFGS并不像牛顿法那么快,但是还是很快。\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_11.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_111.png)\n", + "\n", + "**状况糟糕的非二元函数:**\n", + "\n", + "这种情况下BFGS比牛顿好, 因为它的曲度经验估计比Hessian给出的好。\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_12.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_112.png)\n", + "\n", + "**状况糟糕的极端非二元函数:**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_13.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_113.png)" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully.\n", + " Current function value: 0.000000\n", + " Iterations: 16\n", + " Function evaluations: 24\n", + " Gradient evaluations: 24\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ 1.00000017, 1.00000026])" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x): # rosenbrock函数\n", + " return .5*(1 - x[0])**2 + (x[1] - x[0]**2)**2\n", + "def fprime(x):\n", + " return np.array((-2*.5*(1 - x[0]) - 4*x[0]*(x[1] - x[0]**2), 2*(x[1] - x[0]**2)))\n", + "optimize.fmin_bfgs(f, [2, 2], fprime=fprime)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**L-BFGS**: 限制内存的BFGS介于BFGS和共轭梯度之间: 在非常高的维度 (> 250) 计算和翻转的Hessian矩阵的成本非常高。L-BFGS保留了低秩的版本。此外,scipy版本, [scipy.optimize.fmin_l_bfgs_b()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html#scipy.optimize.fmin_l_bfgs_b), 包含箱边界:" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 8, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "(array([ 1.00000005, 1.00000009]),\n", + " 1.4417677473011859e-15,\n", + " {'funcalls': 17,\n", + " 'grad': array([ 1.02331202e-07, -2.59299369e-08]),\n", + " 'nit': 16,\n", + " 'task': 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL',\n", + " 'warnflag': 0})" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x): # rosenbrock函数\n", + " return .5*(1 - x[0])**2 + (x[1] - x[0]**2)**2\n", + "def fprime(x):\n", + " return np.array((-2*.5*(1 - x[0]) - 4*x[0]*(x[1] - x[0]**2), 2*(x[1] - x[0]**2)))\n", + "optimize.fmin_l_bfgs_b(f, [2, 2], fprime=fprime) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "> **注意**:如果你不为L-BFGS求解器制定梯度,你需要添加approx_grad=1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.7.2.4 较少梯度方法\n", + "\n", + "#### 2.7.2.4.1 打靶法: Powell算法\n", + "\n", + "接近梯度方法\n", + "\n", + "**状态糟糕的二元函数:**\n", + "\n", + "Powell法对低维局部糟糕状况并不很敏感\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_14.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_114.png)\n", + "\n", + "**状况糟糕的极端非二元函数:**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_16.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_116.png)\n", + "\n", + "#### 2.7.2.4.2 单纯形法: Nelder-Mead\n", + "\n", + "Nelder-Mead算法是对高维空间的对立方法的归纳。这个算法通过改进[单纯形](http://en.wikipedia.org/wiki/Simplex)来工作,高维空间间隔和三角形的归纳,包裹最小值。\n", + "\n", + "**长处**: 对噪音很强壮,他不依赖于计算梯度。因此,它可以在局部光滑的函数上工作,比如实验数据点,只要他显示了一个大规模的钟形行为。但是,它在光滑、非噪音函数上比基于梯度的方法慢。\n", + "\n", + "**状况糟糕的非二元函数:**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_17.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_117.png)\n", + "\n", + "**状况糟糕的极端非二元函数:**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_18.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_gradient_descent_118.png)\n", + "\n", + "在scipy中, [scipy.optimize.fmin()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html#scipy.optimize.fmin) 实现了Nelder-Mead法:" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 11, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully.\n", + " Current function value: 0.000000\n", + " Iterations: 46\n", + " Function evaluations: 91\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ 0.99998568, 0.99996682])" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x): # rosenbrock函数\n", + " return .5*(1 - x[0])**2 + (x[1] - x[0]**2)**2\n", + "optimize.fmin(f, [2, 2])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.7.2.5 全局最优化算法\n", + "\n", + "如果你的问题不允许惟一的局部最低点(很难测试除非是凸函数),如果你没有先前知识来让优化起点接近答案,你可能需要全局最优化算法。\n", + "\n", + "#### 2.7.2.5.1 暴力: 网格搜索\n", + "\n", + "[scipy.optimize.brute()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.brute.html#scipy.optimize.brute)在 函数网格内来评价函数,根据最小值返回参数。参数由[numpy.mgrid](http://docs.scipy.org/doc/numpy/reference/generated/numpy.mgrid.html#numpy.mgrid)给出的范围来指定。默认情况下,每个方向进行20步:" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 4, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1.00001462, 1.00001547])" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x): # rosenbrock函数\n", + " return .5*(1 - x[0])**2 + (x[1] - x[0]**2)**2\n", + "optimize.brute(f, ((-1, 2), (-1, 2)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2.7.3 使用scipy优化的现实指南\n", + "\n", + "### 2.7.3.1 选择一个方法\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_compare_optimizers_1.png)\n", + "\n", + "---\n", + "\n", + "**没有关于梯度的知识:**\n", + " \t\n", + "- 一般来说,倾向于BFGS ([scipy.optimize.fmin_bfgs()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_bfgs.html#scipy.optimize.fmin_bfgs)) 或 L-BFGS ([scipy.optimize.fmin_l_bfgs_b()]()), 即使你有大概的数值梯度\n", + "- 在状况良好的问题上,Powell ([scipy.optimize.fmin_powell()]()) 以及 Nelder-Mead ([scipy.optimize.fmin()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html#scipy.optimize.fmin)), 都是在高维上效果良好的梯度自有的方法,但是 ,他们无法支持状况糟糕的问题。\n", + "\n", + "---\n", + "\n", + "---\n", + "\n", + "**有关于梯度的知识:**\n", + " \t\n", + "- BFGS ([scipy.optimize.fmin_bfgs()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_bfgs.html#scipy.optimize.fmin_bfgs)) 或 L-BFGS ([scipy.optimize.fmin_l_bfgs_b()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html#scipy.optimize.fmin_l_bfgs_b))。\n", + "- BFGS的计算开支要大于L-BFGS, 它自身也比共轭梯度法开销大。另一方面,BFGS通常比CG(共轭梯度法)需要更少函数评估。因此,共轭梯度法在优化计算量较少的函数时比BFGS更好。\n", + "\n", + "---\n", + "\n", + "---\n", + "**带有Hessian**:\n", + "\n", + "- 如果你可以计算Hessian, 推荐牛顿法 ([scipy.optimize.fmin_ncg()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_ncg.html#scipy.optimize.fmin_ncg))。\n", + "\n", + "**如果有噪音测量**:\n", + " \t\n", + "使用Nelder-Mead ([scipy.optimize.fmin()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin.html#scipy.optimize.fmin)) 或者 Powell ([scipy.optimize.fmin_powell()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_powell.html#scipy.optimize.fmin_powell))。\n", + "\n", + "### 2.7.3.2 让优化器更快\n", + "\n", + "- 选择正确的方法 (见上面), 如果可以的话,计算梯度和Hessia。\n", + "- 可能的时候使用[preconditionning](http://en.wikipedia.org/wiki/Preconditioner)。\n", + "- 聪明的选择你的起点。例如,如果你正在运行许多相似的优化,那么在其他结果上软启动。\n", + "- 如果你不需要准确,那么请放松并容忍\n", + "\n", + "### 2.7.3.3 计算梯度\n", + "\n", + "计算梯度甚至是Hessians的努力, 是枯燥的但是也是值得的。使用[Sympy](http://www.scipy-lectures.org/packages/sympy.html#sympy)来进行象征计算将非常方便。\n", + "\n", + "优化不能很好收敛的一个来源是计算梯度过程的人为错误。你可以用[scipy.optimize.check_grad()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.check_grad.html#scipy.optimize.check_grad)来检查一下梯度是否正确。它返回给出的梯度与计算的梯度之间差异的基准:" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 9, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "2.384185791015625e-07" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "optimize.check_grad(f, fprime, [2, 2])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "也看一下[scipy.optimize.approx_fprime()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.approx_fprime.html#scipy.optimize.approx_fprime)找一下你的错误。\n", + "#### 2.7.3.4 合成练习\n", + "\n", + "**练习: 简单的 (?) 二次函数**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_exercise_ill_conditioned_1.png)\n", + "\n", + "用K[0]作为起始点优化下列函数:" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 2, "metadata": { - "collapsed": true + "collapsed": false }, "outputs": [], - "source": [] + "source": [ + "np.random.seed(0)\n", + "K = np.random.normal(size=(100, 100))\n", + "\n", + "def f(x):\n", + " return np.sum((np.dot(K, x - 1))**2) + np.sum(x**2)**2" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "计时你的方法。找到最快的方法。为什么BFGS不好用了?\n", + "\n", + "**练习:局部扁平最小化**\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_exercise_flat_minimum_0.png)\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_exercise_flat_minimum_1.png)\n", + "\n", + "考虑一下函数$exp(-1/(.1*x^2 + y^2)$。这个函数在(0,0)存在一个最小值。从起点(1,1)开始,试着在$1e-8$达到这个最低点。\n", + "\n", + "## 2.7.4 特殊案例: 非线性最小二乘\n", + "\n", + "### 2.7.4.1 最小化向量函数的基准\n", + "\n", + "最小二乘法,向量函数基准值的最小化,有特定的结构可以用在[scipy.optimize.leastsq()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.leastsq.html#scipy.optimize.leastsq)中实现的[Levenberg–Marquardt 算法](https://en.wikipedia.org/wiki/Levenberg-Marquardt_algorithm)。\n", + "\n", + "让我们试一下最小化下面向量函数的基准:" + ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 5, "metadata": { - "collapsed": true + "collapsed": false }, - "outputs": [], - "source": [] + "outputs": [ + { + "data": { + "text/plain": [ + "(array([ 0. , 0.11111111, 0.22222222, 0.33333333, 0.44444444,\n", + " 0.55555556, 0.66666667, 0.77777778, 0.88888889, 1. ]), 2)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x):\n", + " return np.arctan(x) - np.arctan(np.linspace(0, 1, len(x)))\n", + "x0 = np.zeros(10)\n", + "optimize.leastsq(f, x0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "这用了67次函数评估(用'full_output=1'试一下)。如果我们自己计算基准并且使用一个更好的通用优化器(BFGS)会怎么样:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully.\n", + " Current function value: 0.000000\n", + " Iterations: 11\n", + " Function evaluations: 144\n", + " Gradient evaluations: 12\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ -7.44987291e-09, 1.11112265e-01, 2.22219893e-01,\n", + " 3.33331914e-01, 4.44449794e-01, 5.55560493e-01,\n", + " 6.66672149e-01, 7.77779758e-01, 8.88882036e-01,\n", + " 1.00001026e+00])" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def g(x):\n", + " return np.sum(f(x)**2)\n", + "optimize.fmin_bfgs(g, x0) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "BFGS需要更多的函数调用,并且给出了一个并不精确的结果。\n", + "\n", + "注意只有当输出向量的维度非常大,比需要优化的函数还要大,`leastsq`与BFGS相类比才是有趣的。\n", + "\n", + "如果函数是线性的,这是一个线性代数问题,应该用[scipy.linalg.lstsq()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.lstsq.html#scipy.linalg.lstsq)解决。\n", + "\n", + "### 2.7.4.2 曲线拟合\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_curve_fit_1.png)\n", + "\n", + "最小二乘问题通常出现在拟合数据的非线性拟合时。当我们自己构建优化问题时,scipy提供了这种目的的一个帮助函数: [scipy.optimize.curve_fit()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html#scipy.optimize.curve_fit):" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([ 1.50600889, 0.98754323]), array([[ 0.00030286, -0.00045233],\n", + " [-0.00045233, 0.00098838]]))" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(t, omega, phi):\n", + " return np.cos(omega * t + phi)\n", + "x = np.linspace(0, 3, 50)\n", + "y = f(x, 1.5, 1) + .1*np.random.normal(size=50)\n", + "optimize.curve_fit(f, x, y)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**练习**\n", + "\n", + "用omega = 3来进行相同的练习。困难是什么?\n", + "\n", + "## 2.7.5 有限制条件的优化\n", + "\n", + "### 2.7.5.1 箱边界\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_constraints_2.png)\n", + "\n", + "箱边界是指限制优化的每个函数。注意一些最初不是写成箱边界的问题可以通过改变变量重写。\n", + "\n", + "- [scipy.optimize.fminbound()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fminbound.html#scipy.optimize.fminbound)进行一维优化\n", + "- [scipy.optimize.fmin_l_bfgs_b()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_l_bfgs_b.html#scipy.optimize.fmin_l_bfgs_b)带有边界限制的quasi-Newton方法:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(array([ 1.5, 1.5]),\n", + " 1.5811388300841898,\n", + " {'funcalls': 12,\n", + " 'grad': array([-0.94868331, -0.31622778]),\n", + " 'nit': 2,\n", + " 'task': 'CONVERGENCE: NORM_OF_PROJECTED_GRADIENT_<=_PGTOL',\n", + " 'warnflag': 0})" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x):\n", + " return np.sqrt((x[0] - 3)**2 + (x[1] - 2)**2)\n", + "optimize.fmin_l_bfgs_b(f, np.array([0, 0]), approx_grad=1, bounds=((-1.5, 1.5), (-1.5, 1.5))) " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.7.5.2 通用限制\n", + "\n", + "相等和不相等限制特定函数: f(x) = 0 and g(x)< 0。\n", + "\n", + "![](http://www.scipy-lectures.org/_images/plot_non_bounds_constraints_1.png)\n", + "\n", + "- [scipy.optimize.fmin_slsqp()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_slsqp.html#scipy.optimize.fmin_slsqp) 序列最小二乘程序: 相等和不相等限制:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Optimization terminated successfully. (Exit mode 0)\n", + " Current function value: 2.47487373504\n", + " Iterations: 5\n", + " Function evaluations: 20\n", + " Gradient evaluations: 5\n" + ] + }, + { + "data": { + "text/plain": [ + "array([ 1.25004696, 0.24995304])" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def f(x):\n", + " return np.sqrt((x[0] - 3)**2 + (x[1] - 2)**2)\n", + "\n", + "def constraint(x):\n", + " return np.atleast_1d(1.5 - np.sum(np.abs(x)))\n", + "\n", + "optimize.fmin_slsqp(f, np.array([0, 0]), ieqcons=[constraint, ])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "- [scipy.optimize.fmin_cobyla()](http://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.fmin_cobyla.html#scipy.optimize.fmin_cobyla)通过线性估计的限定优化:只有不相等限制:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "collapsed": false + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([ 1.25009622, 0.24990378])" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "optimize.fmin_cobyla(f, np.array([0, 0]), cons=constraint)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "上面这个问题在统计中被称为[Lasso](http://en.wikipedia.org/wiki/Lasso_(statistics)#LASSO_method)问题, 有许多解决它的高效方法 (比如在[scikit-learn](http://scikit-learn.org/)中)。一般来说,当特定求解器存在时不需要使用通用求解器。\n", + "\n", + "**拉格朗日乘子法**\n", + "\n", + "如果你有足够的数学知识,许多限定优化问题可以被转化为非限定性优化问题,使用被称为拉格朗日乘子法的数学技巧。" + ] } ], "metadata": { @@ -477,7 +1113,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", - "version": "2.7.10" + "version": "2.7.11" } }, "nbformat": 4, diff --git "a/3.2. Sympy\357\274\232Python\344\270\255\347\232\204\347\254\246\345\217\267\346\225\260\345\255\246.ipynb" "b/3.2. Sympy\357\274\232Python\344\270\255\347\232\204\347\254\246\345\217\267\346\225\260\345\255\246.ipynb" index bf7e8df..d1674c0 100644 --- "a/3.2. Sympy\357\274\232Python\344\270\255\347\232\204\347\254\246\345\217\267\346\225\260\345\255\246.ipynb" +++ "b/3.2. Sympy\357\274\232Python\344\270\255\347\232\204\347\254\246\345\217\267\346\225\260\345\255\246.ipynb" @@ -1,23 +1,34 @@ { - "metadata": { - "name": "", - "signature": "sha256:4241644a06829855e5a79a3650ab26bc658c3eae01016cde5ae1f1fc3eb9aada" - }, - "nbformat": 3, - "nbformat_minor": 0, - "worksheets": [ + "cells": [ { - "cells": [ - { - "cell_type": "code", - "collapsed": false, - "input": [], - "language": "python", - "metadata": {}, - "outputs": [] - } - ], - "metadata": {} + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": false + }, + "outputs": [], + "source": [] } - ] -} \ No newline at end of file + ], + "metadata": { + "kernelspec": { + "display_name": "Python 2", + "language": "python", + "name": "python2" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2", + "version": "2.7.11" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +}