精华内容
下载资源
问答
  • 梯度下降法matlab程序

    2019-06-04 10:46:49
    梯度下降法matlab程序,需要手动输入参数梯度下降法matlab程序,需要手动输入参数梯度下降法matlab程序,需要手动输入参数
  • 梯度下降是迭代法的一种,可以用于求解最小二乘问题(线性和非线性都可以)。在求解机器学习算法的模型参数,即无...在机器学习中,基于基本的梯度下降法发展了两种梯度下降方法,分别为随机梯度下降法和批量梯度下降法
  • 梯度下降法matlab程序,需要手动输入参数梯度下降法matlab程序,需要手动输入参数
  • 这是一个matlab梯度下降的实现,模拟的是x^2+y^2最小值的取得
  • 和 stochastic gradient descent(随机梯度下降)http://blog.csdn.net/pennyliang/article/details/6998517上面是作者的blog,代码采用的是C的,我这里采用了比较简便的matlab方式。批量梯度下降是一种对参数的upd...

    【Machine Learning实验1】batch gradient descent(批量梯度下降) 和 stochastic gradient descent(随机梯度下降)

    http://blog.csdn.net/pennyliang/article/details/6998517

    上面是作者的blog,代码采用的是C的,我这里采用了比较简便的matlab方式。

    批量梯度下降是一种对参数的update进行累积,然后批量更新的一种方式。用于在已知整个训练集时的一种训练方式,但对于大规模数据并不合适。

    随机梯度下降是一种对参数随着样本训练,一个一个的及时update的方式。常用于大规模训练集,当往往容易收敛到局部最优解。

    详细参见:Andrew Ng 的Machine Learning的课件(见参考1)

    可能存在的改进

    1)样本可靠度,特征完备性的验证

    例如可能存在一些outlier,这种outlier可能是测量误差,也有可能是未考虑样本特征,例如有一件衣服色彩评分1分,料子1分,确可以卖到10000万元,原来是上面有一个姚明的签名,这个特征没有考虑,所以出现了训练的误差,识别样本中outlier产生的原因。

    2)批量梯度下降方法的改进

    并行执行批量梯度下降

    3)随机梯度下降方法的改进

    找到一个合适的训练路径(学习顺序),去最大可能的找到全局最优解

    4)假设合理性的检验

    H(X)是否合理的检验

    5)维度放大

    维度放大和过拟合问题,维度过大对训练集拟合会改善,对测试集的适用性会变差,如果找到合理的方法?

    下面是我做的一个实验

    假定有这样一个对衣服估价的训练样本,代码中matrix表示,第一列表示色彩的评分,第二列表示对料子质地的评分,例如第一个样本1,4表示这件衣服色彩打1分,料子打4分。我们需要训练的是theta,其表示在衣服的估价中,色彩和料子的权重,这个权重是未知量,是需要训练的,训练的依据是这四个样本的真实价格已知,分别为19元,...20元。

    通过批量梯度下降和随机梯度下降的方法均可得到theta_C={3,4}T

    /*

    Matrix_A

    1   4

    2   5

    5   1

    4   2

    theta_C

    ?

    ?

    Matrix_A*theta_C

    19

    26

    19

    20

    */

    批量梯度下降法:

    0818b9ca8b590ca3270a3433284dd417.png

    随机梯度下降法:

    0818b9ca8b590ca3270a3433284dd417.png

    参考文献:

    【2】http://www.cnblogs.com/rocketfan/archive/2011/02/27/1966325.html

    【3】http://www.dsplog.com/2011/10/29/batch-gradient-descent/

    0818b9ca8b590ca3270a3433284dd417.png

    【4】http://ygc.name/2011/03/22/machine-learning-ex2-linear-regression/

    展开全文
  • 九轴传感器姿态解算方法(互补滤波和梯度下降法MATLAB)九轴传感器姿态解算方法(互补滤波和梯度下降法MATLAB)九轴传感器姿态解算方法(互补滤波和梯度下降法MATLAB)九轴传感器姿态解算方法(互补滤波和梯度下降法...
  • 梯度下降、随机梯度下降、小批量梯度下降、动量梯度下降、Nesterov加速梯度下降法前言梯度下降法(Gradient Descent / GD)单变量线性回归模型(Univariate Linear Regression)批梯度下降法(Batch Gradient ...

    前言

    本文通过使用单变量线性回归模型讲解一些常见的梯度下降算法。为了更好的作对比,我们对参数设置了不同的学习率

    梯度下降法(GD / Gradient Descent)

    梯度下降法(Gradient Descent)是一种常见的、用于寻找函数极小值的一阶迭代优化算法,又称为最速下降(Steepest Descent),它是求解无约束最优化问题的一种常用方法。以下是梯度下降的基本公式:

    θ:=θJ(θ)θ \theta := \theta - \frac{\partial J(\theta)}{\partial \theta}
    其中 J(θ)J(\theta) 是关于参数 θ\theta 的损失函数,η\eta学习率(正标量),也称为梯度下降的步长。由上述公式可知,梯度下降的想法是使得参数 θ\theta 向着负梯度方向做更新迭代。

    下面通过一个简单的例子来直观了解梯度下降的过程。假设 J(θ)J(\theta) 为关于 θ\theta 的一元二次函数,即 J(θ)=θ2J(\theta) = \theta^2。假设初始值为 (θ,J(θ))=(9,81)(\theta, J(\theta)) = (9,81)η=0.2\eta = 0.2。对于第一次迭代,求取梯度为:J(θ)θ=2θ=18\frac{\partial J(\theta)}{\partial \theta} = 2\theta = 18,更新θ:=θJ(θ)θ=90.218=5.4\theta := \theta - \frac{\partial J(\theta)}{\partial \theta} = 9 - 0.2*18 = 5.4。继续重复迭代步骤,最后函数将会趋于极值点x=0x = 0

    如下图所示,我们可以发现,随着梯度的不断减小,函数收敛越来越慢:

    在这里插入图片描述

    下面是该程序的matlab代码:

    % writen by: Weichen Gu,   date: 4/18th/2020 
    clc; clf; clear;
    xdata = linspace(-10,10,1000);                   % x range
    f = @(xdata)(xdata.^2);                         % Quadratic function - loss function
    
    ydata = f(xdata);
    plot(xdata,ydata,'c','linewidth',2);
    title('y = x^2 (learning rate = 0.2)');
    hold on;
    
    x = [9 f(9)];            % Initial point
    slope = inf;               % Slope
    LRate = 0.2;               % Learning rate
    slopeThresh = 0.0001;      % Slope threshold for iteration
    
    plot(x(1),x(2),'r*');
    
    while abs(slope) > slopeThresh
        [tmp,slope] = gradientDescent(x(1),f,LRate);
        x1 = [tmp f(tmp)];
        line([x(1),x1(1)],[x(2),x1(2)],'color','k','linestyle','--','linewidth',1);
        plot(x1(1),x1(2),'r*');
        legend('y = x^2');
        drawnow;
        x = x1;
    end
    hold off
    
    function [xOut,slope] = gradientDescent(xIn,f, eta)
        syms x;
        slope = double(subs(diff(f(x)),xIn));
        xOut = xIn- eta*(slope);
    end

    学习率决定了梯度下降的收敛速度,合适的学习率会使收敛速度得到一定的提高。我们应该避免过小或者过大的学习率:

    • 学习率过小会导致函数收敛速度非常慢,所需迭代次数多,难以收敛。
    • 学习率过大会导致梯度下降过程中出现震荡、不收敛甚至发散的情况。

    如下图所示:

    在这里插入图片描述
    对于学习率,一开始可以设置较大的步长,然后再慢慢减小步长来调整收敛速度,但是实际过程中,设置最优的学习率一般来说是比较困难的。

    同时我们应该知道,梯度下降是一种寻找函数极小值的方法,如下图,不同的初始点可能会获得不同的极小值:

    在这里插入图片描述

    单变量线性回归模型(Univariate Linear Regression)

    下面我们通过使用吴恩达老师课程中的单变量线性回归(Univariate Linear Regression)模型来讲解几种梯度下降策略。如下公式:
    hθ(x)=θ0x+θ1 h_{\theta}(x) = \theta_0 x+\theta_1

    它是一种简单的线性拟合模型,并且目标函数为凸函数(根据一阶导二阶导性质),所以对于梯度下降求得的极小值就是损失函数的最优解。下列公式为单变量线性回归模型的损失函数/目标函数
    J(θ0,θ1)=12mi=1m(hθ(xi)yi)2 J(\theta_0,\theta_1) = \frac{1}{2m}\sum_{i = 1}^m(h_{\theta}(x^i)-y^i)^2
    其中hθ(x(i))h_\theta(x^{(i)})是通过回归模型预测的输出。mm为样本个数。
    我们的目标是求出使得目标函数最小的参数值
    (θ^0,θ^1)=arg minθ0,θ1J(θ0,θ1) (\hat\theta_0,\hat\theta_1) = \argmin_{\theta_0,\theta_1}J(\theta_0,\theta_1)

    对损失函数进行梯度求解,可得到:

    J(θ0,θ1)θ0=1mi=1m(θ0+θ1xiyi) \frac{\partial J(\theta_0,\theta_1)}{\partial \theta_0} = \frac{1}{m}\sum_{i = 1}^m(\theta_0+\theta_1x^i -y^i) J(θ0,θ1)θ1=1mi=1m(θ0+θ1xiyi)xi \frac{\partial J(\theta_0,\theta_1)}{\partial \theta_1} = \frac{1}{m}\sum_{i = 1}^m(\theta_0+\theta_1x^i -y^i)x^i

    批梯度下降法(Batch GD / Batch Gradient Descent)

    批梯度下降法根据整个数据集求取损失函数的梯度,来更新参数θ\theta的值。公式如下:
    θ:=θη1mi=1mJ(θ,xi,yi)θ \theta := \theta - \eta\cdot\frac{1}{m}\sum_{i=1}^m\frac{\partial J(\theta,x^i,y^i)}{\partial \theta}
    优点:考虑到了全局数据,更新稳定,震荡小,每次更新都会朝着正确的方向。
    缺点:如果数据集比较大,会造成大量时间和空间开销

    下图显示了通过批梯度下降进行拟合的结果。根据图像可看到,我们的线性回归模型逐渐拟合到了一个不错的结果。
    在这里插入图片描述

    下面给出matlab的代码:

    % Writen by weichen GU, data 4/19th/2020
    clear, clf, clc; 
    
    data = linspace(-20,20,100);                    % x range
    col = length(data);                             % Obtain the number of x
    data = [data;0.5*data + wgn(1,100,1).*2+10];    % Generate dataset - y = 0.5 * x + wgn^2 + 10;
    X = [ones(1, col); data(1,:)]';                 % X ->[1;X];
    
    plot(data(1,:),data(2,:),'r.','MarkerSize',10); % Plot data
    title('Data fiting using Univariate Linear Regression');
    axis([-30,30,-10,30])
    hold on;
    
    theta =[0;0];           % Initialize parameters
    LRate = [0.1; 0.002]; % Learning rate
    thresh = 0.5;           % Threshold of loss for jumping iteration
    iteration = 40;        % The number of teration
    
    lineX = linspace(-30,30,100);
    [row, col] = size(data)                     % Obtain the size of dataset
    lineMy = [lineX;theta(1)*lineX+theta(2)];   % Fitting line
    hLine = plot(lineMy(1,:),lineMy(2,:),'c','linewidth',2);  % draw fitting line
    
    for iter = 1 : iteration
        delete(hLine)       % set(hLine,'visible','off')
        
        [thetaOut] = GD(X,data(2,:)',theta,LRate); % Gradient Descent algorithm
        theta = thetaOut;   % Update parameters(theta)
        
        loss = getLoss(X,data(2,:)',col,theta); % Obtain the loss 
        lineMy(2,:) = theta(2)*lineX+theta(1); % Fitting line
        hLine = plot(lineMy(1,:),lineMy(2,:),'c','linewidth',2); % draw fitting line
        
        drawnow()
        
        if(loss < thresh)
            break;
        end
    end
    
    hold off
    
    
    function [thetaOut] = GD(X,Y,theta,LRate)
        dataSize = length(X);               % Obtain the number of data 
        dx = 1/dataSize*(X'*(X*theta-Y));   % Obtain the gradient
        thetaOut = theta -LRate.*dx;         % gradient descent method
    end
    
    
    function [Z] = getLoss(X,Y, num,theta)
        Z= 1/(2*num)*sum((X*theta-Y).^2);
    end
    

    我们希望可以直观的看到批梯度下降的过程,以便和后续的一些梯度下降算法作比较,于是我们将其三维可视化出来,画出其拟合图(1)、三维损失图(2)、等高线图(3)以及迭代损失图(4)

    在这里插入图片描述

    根据上图可以看到,批梯度下降收敛稳定,震荡小。

    这里给出具体matlab实现代码:

    % Writen by weichen GU, data 4/19th/2020
    clear, clf, clc;
    data = linspace(-20,20,100);                    % x range
    col = length(data);                             % Obtain the number of x
    data = [data;0.5*data + wgn(1,100,1)+10];       % Generate dataset - y = 0.5 * x + wgn^2 + 10;
    X = [ones(1, col); data(1,:)]';                 % X ->[1;X];
    
    t1=-40:0.1:50;
    t2=-4:0.1:4;
    [meshX,meshY]=meshgrid(t1,t2);
    meshZ = getTotalCost(X, data(2,:)', col, meshX,meshY);
    
    theta =[-30;-4];        % Initialize parameters
    LRate = [0.1; 0.002];   % Learning rate
    thresh = 0.5;           % Threshold of loss for jumping iteration
    iteration = 50;         % The number of teration
    
    lineX = linspace(-30,30,100);
    [row, col] = size(data)                                     % Obtain the size of dataset
    lineMy = [lineX;theta(1)*lineX+theta(2)];                   % Fitting line
    hLine = plot(lineMy(1,:),lineMy(2,:),'c','linewidth',2);    % draw fitting line
    
    loss = getLoss(X,data(2,:)',col,theta);                     % Obtain current loss value
    
    subplot(2,2,1);
    plot(data(1,:),data(2,:),'r.','MarkerSize',10);
    title('Data fiting using Univariate LR');
    axis([-30,30,-10,30])
    xlabel('x');
    ylabel('y');
    hold on;
    
    % Draw 3d loss surfaces
    subplot(2,2,2)
    mesh(meshX,meshY,meshZ)
    xlabel('θ_0');
    ylabel('θ_1');
    title('3D surfaces for loss')
    hold on;
    scatter3(theta(1),theta(2),loss,'r*');
    
    % Draw loss contour figure
    subplot(2,2,3)
    contour(meshX,meshY,meshZ)
    xlabel('θ_0');
    ylabel('θ_1');
    title('Contour figure for loss')
    hold on;
    plot(theta(1),theta(2),'r*')
    
    % Draw loss with iteration
    subplot(2,2,4)
    hold on;
    title('Loss when using Batch GD');
    xlabel('iter');
    ylabel('loss');
    plot(0,loss,'b*');
    
    set(gca,'XLim',[0 iteration]);
    %set(gca,'YLim',[0 4000]);
    hold on;
    
    for iter = 1 : iteration
        delete(hLine) % set(hLine,'visible','off')
        
        [thetaOut] = GD(X,data(2,:)',theta,LRate); % Gradient Descent algorithm
        subplot(2,2,3);
        line([theta(1),thetaOut(1)],[theta(2),thetaOut(2)],'color','k')
    
        theta = thetaOut;
        loss = getLoss(X,data(2,:)',col,theta); % Obtain losw
        
        
        lineMy(2,:) = theta(2)*lineX+theta(1); % Fitting line
        subplot(2,2,1);
        hLine = plot(lineMy(1,:),lineMy(2,:),'c','linewidth',2); % draw fitting line
        %legend('training data','linear regression');
        
        subplot(2,2,2);
        scatter3(theta(1),theta(2),loss,'r*');
        
        subplot(2,2,3);
        plot(theta(1),theta(2),'r*')
        
        subplot(2,2,4)
        plot(iter,loss,'b*');
        
        drawnow();
        
        if(loss < thresh)
            break;
        end
    end
    
    hold off
    
    
    function [Z] = getTotalCost(X,Y, num,meshX,meshY);
        [row,col] = size(meshX);
        Z = zeros(row, col);
        for i = 1 : row
            theta = [meshX(i,:); meshY(i,:)];
            Z(i,:) =  1/(2*num)*sum((X*theta-repmat(Y,1,col)).^2);   
        end
    
    end
    
    function [Z] = getLoss(X,Y, num,theta)
        Z= 1/(2*num)*sum((X*theta-Y).^2);
    end
    
    function [thetaOut] = GD(X,Y,theta,eta)
        dataSize = length(X);                       % Obtain the number of data
        dx = 1/dataSize.*(X'*(X*theta-Y));          % Obtain the gradient of Loss function
        thetaOut = theta -eta.*dx;                  % Update parameters(theta)
    end
    

    随机梯度下降法(SGD / Stochastic Gradient Descent)

    因为批梯度下降所需要的时间空间成本大,我们考虑随机抽取一个样本来获取梯度,更新参数 θ\theta。其公式如下:
    θ:=θηJ(θ,xi,yi)θ \theta := \theta - \eta\cdot\frac{\partial J(\theta,x^i,y^i)}{\partial \theta} 步骤如下:

    1. 随机打乱数据集。
    2. 对于每个样本点,依次迭代更新参数θ\theta
    3. 重复以上步骤直至损失足够小或到达迭代阈值

    优点:训练速度快,每次迭代只需要一个样本,相对于批梯度下降,总时间、空间开销小。
    缺点:噪声大导致高方差。导致每次迭代并不一定朝着最优解收敛,震荡大。

    下图直观显示了随机梯度下降的迭代过程:
    在这里插入图片描述

    下面给出随机梯度下降的matlab代码,因为它是小批量梯度下降的特殊情况(batchSize = 1),所以我将小批量梯度下降的代码放在本章节,并设置batchSize = 1。

    % Writen by weichen GU, data 4/19th/2020
    clear, clf, clc;
    data = linspace(-20,20,100);                    % x range
    col = length(data);                             % Obtain the number of x
    data = [data;0.5*data + wgn(1,100,1)+10];       % Generate dataset - y = 0.5 * x + wgn^2 + 10;
    X = [ones(1, col); data(1,:)]';                 % X ->[1;X];
    
    t1=-40:0.1:50;
    t2=-4:0.1:4;
    [meshX,meshY]=meshgrid(t1,t2);
    meshZ = getTotalCost(X, data(2,:)', col, meshX,meshY);
    
    theta =[-30;-4];        % Initialize parameters
    LRate = [0.1; 0.002]  % Learning rate
    thresh = 0.5;           % Threshold of loss for jumping iteration
    iteration = 100;        % The number of teration
    
    lineX = linspace(-30,30,100);
    [row, col] = size(data)                                     % Obtain the size of dataset
    lineMy = [lineX;theta(1)*lineX+theta(2)];                   % Fitting line
    hLine = plot(lineMy(1,:),lineMy(2,:),'c','linewidth',2);    % draw fitting line
    
    loss = getLoss(X,data(2,:)',col,theta);                     % Obtain current loss value
    
    subplot(2,2,1);
    plot(data(1,:),data(2,:),'r.','MarkerSize',10);
    title('Data fiting using Univariate LR');
    axis([-30,30,-10,30])
    xlabel('x');
    ylabel('y');
    hold on;
    
    % Draw 3d loss surfaces
    subplot(2,2,2)
    mesh(meshX,meshY,meshZ)
    xlabel('θ_0');
    ylabel('θ_1');
    title('3D surfaces for loss')
    hold on;
    scatter3(theta(1),theta(2),loss,'r*');
    
    % Draw loss contour figure
    subplot(2,2,3)
    contour(meshX,meshY,meshZ)
    xlabel('θ_0');
    ylabel('θ_1');
    title('Contour figure for loss')
    hold on;
    plot(theta(1),theta(2),'r*')
    
    % Draw loss with iteration
    subplot(2,2,4)
    hold on;
    title('Loss when using SGD');
    xlabel('iter');
    ylabel('loss');
    plot(0,loss,'b*');
    
    set(gca,'XLim',[0 iteration]);
    hold on;
    
    batchSize = 1;
    for iter = 1 : iteration
        delete(hLine) % set(hLine,'visible','off')
    
        
        %[thetaOut] = GD(X,data(2,:)',theta,LRate); % Gradient Descent algorithm
        [thetaOut] = MBGD(X,data(2,:)',theta,LRate,batchSize);
        subplot(2,2,3);
        line([theta(1),thetaOut(1)],[theta(2),thetaOut(2)],'color','k')
    
        theta = thetaOut;
        loss = getLoss(X,data(2,:)',col,theta); % Obtain losw
        
        
        lineMy(2,:) = theta(2)*lineX+theta(1); % Fitting line
        subplot(2,2,1);
        hLine = plot(lineMy(1,:),lineMy(2,:),'c','linewidth',2); % draw fitting line
        %legend('training data','linear regression');
        
        subplot(2,2,2);
        scatter3(theta(1),theta(2),loss,'r*');
        
        subplot(2,2,3);
        plot(theta(1),theta(2),'r*')
        
        
        subplot(2,2,4)
        plot(iter,loss,'b*');
        
        drawnow();
        
        if(loss < thresh)
            break;
        end
    end
    
    hold off
    
    
    function [Z] = getTotalCost(X,Y, num,meshX,meshY);
        [row,col] = size(meshX);
        Z = zeros(row, col);
        for i = 1 : row
            theta = [meshX(i,:); meshY(i,:)];
            Z(i,:) =  1/(2*num)*sum((X*theta-repmat(Y,1,col)).^2);   
        end
    
    end
    
    
    function [Z] = getLoss(X,Y, num,theta)
        Z= 1/(2*num)*sum((X*theta-Y).^2);
    end
    
    function [thetaOut] = GD(X,Y,theta,eta)
        dataSize = length(X);                       % Obtain the number of data
        dx = 1/dataSize.*(X'*(X*theta-Y));          % Obtain the gradient of Loss function
        thetaOut = theta -eta.*dx;                  % Update parameters(theta)
    end
    
    % @ Depscription: 
    %       Mini-batch Gradient Descent (MBGD)
    %       Stochastic Gradient Descent(batchSize = 1) (SGD)
    % @ param:
    %       X - [1 X_] X_ is actual X; Y - actual Y
    %       theta - theta for univariate linear regression y_pred = theta_0 + theta1*x
    %       eta - learning rate;
    %
    function [thetaOut] = MBGD(X,Y,theta, eta,batchSize) 
        dataSize = length(X);           % obtain the number of data 
        k = fix(dataSize/batchSize);    % obtain the number of batch which has absolutely same size: k = batchNum-1;    
        batchIdx = randperm(dataSize);  % randomly sort for every epoch for achiving sample diversity
        
        batchIdx1 = reshape(batchIdx(1:k*batchSize),k,batchSize);   % batches which has absolutely same size
        batchIdx2 = batchIdx(k*batchSize+1:end);                    % ramained batch
        
        for i = 1 : k
            thetaOut = GD(X(batchIdx1(i,:),:),Y(batchIdx1(i,:)),theta,eta);
        end
        if(~isempty(batchIdx2))
            thetaOut = GD(X(batchIdx2,:),Y(batchIdx2),thetaOut,eta);
        end
    end
    

    小批量梯度下降(Mini Batch GD / Mini-batch Gradient Descent)

    随机梯度下降收敛速度快,但是波动较大,在最优解处出现波动很难判断其是否收敛到一个合理的值。而批梯度下降稳定但是时空开销很大,针对于这种情况,我们引入小批量梯度下降对二者进行折衷,在每轮迭代过程中使用n个样本来训练参数:
    θ:=θη1ni=1nJ(θ,xi,yi)θ \theta := \theta - \eta\cdot\frac{1}{n}\sum_{i=1}^n\frac{\partial J(\theta,x^i,y^i)}{\partial \theta} 步骤如下:

    1. 随机打乱数据集。
    2. 将数据集分为mn\frac{m}{n}个集合,如果有余,将剩余样本单独作为一个集合。
    3. 依次对于这些集合做批梯度下降,更新参数θ\theta
    4. 重复上述步骤,直至损失足够小或达到迭代阈值

    下图显示了小批量梯度下降的迭代过程,其中batchSize = 32。我们可以看到,小批量折衷了批梯度下降和随机梯度下降的特点,做到了相对来说收敛快、较稳定的结果。

    在这里插入图片描述

    这里附上小批量梯度下降的相关代码:

    % Writen by weichen GU, data 4/19th/2020
    clear, clf, clc;
    data = linspace(-20,20,100);                    % x range
    col = length(data);                             % Obtain the number of x
    data = [data;0.5*data + wgn(1,100,1)+10];       % Generate dataset - y = 0.5 * x + wgn^2 + 10;
    X = [ones(1, col); data(1,:)]';                 % X ->[1;X];
    
    t1=-40:0.1:50;
    t2=-4:0.1:4;
    [meshX,meshY]=meshgrid(t1,t2);
    meshZ = getTotalCost(X, data(2,:)', col, meshX,meshY);
    
    theta =[-30;-4];        % Initialize parameters
    LRate = [0.1; 0.002]  % Learning rate
    thresh = 0.5;           % Threshold of loss for jumping iteration
    iteration = 50;        % The number of teration
    
    lineX = linspace(-30,30,100);
    [row, col] = size(data)                                     % Obtain the size of dataset
    lineMy = [lineX;theta(1)*lineX+theta(2)];                   % Fitting line
    hLine = plot(lineMy(1,:),lineMy(2,:),'c','linewidth',2);    % draw fitting line
    
    loss = getLoss(X,data(2,:)',col,theta);                     % Obtain current loss value
    
    subplot(2,2,1);
    plot(data(1,:),data(2,:),'r.','MarkerSize',10);
    title('Data fiting using Univariate LR');
    axis([-30,30,-10,30])
    xlabel('x');
    ylabel('y');
    hold on;
    
    % Draw 3d loss surfaces
    subplot(2,2,2)
    mesh(meshX,meshY,meshZ)
    xlabel('θ_0');
    ylabel('θ_1');
    title('3D surfaces for loss')
    hold on;
    scatter3(theta(1),theta(2),loss,'r*');
    
    % Draw loss contour figure
    subplot(2,2,3)
    contour(meshX,meshY,meshZ)
    xlabel('θ_0');
    ylabel('θ_1');
    title('Contour figure for loss')
    hold on;
    plot(theta(1),theta(2),'r*')
    
    % Draw loss with iteration
    subplot(2,2,4)
    hold on;
    title('Loss when using Mini-Batch GD');
    xlabel('iter');
    ylabel('loss');
    plot(0,loss,'b*');
    
    set(gca,'XLim',[0 iteration]);
    %set(gca,'YLim',[0 4000]);
    hold on;
    
    batchSize = 32;
    for iter = 1 : iteration
        delete(hLine) % set(hLine,'visible','off')
    
        
        %[thetaOut] = GD(X,data(2,:)',theta,LRate); % Gradient Descent algorithm
        [thetaOut] = MBGD(X,data(2,:)',theta,LRate,batchSize);
        subplot(2,2,3);
        line([theta(1),thetaOut(1)],[theta(2),thetaOut(2)],'color','k')
    
        theta = thetaOut;
        loss = getLoss(X,data(2,:)',col,theta); % Obtain losw
        
        
        lineMy(2,:) = theta(2)*lineX+theta(1); % Fitting line
        subplot(2,2,1);
        hLine = plot(lineMy(1,:),lineMy(2,:),'c','linewidth',2); % draw fitting line
        %legend('training data','linear regression');
        
        subplot(2,2,2);
        scatter3(theta(1),theta(2),loss,'r*');
        
        subplot(2,2,3);
        plot(theta(1),theta(2),'r*')
        
        
        subplot(2,2,4)
        plot(iter,loss,'b*');
        
        drawnow();
        
        if(loss < thresh)
            break;
        end
    end
    
    hold off
    
    
    function [Z] = getTotalCost(X,Y, num,meshX,meshY);
        [row,col] = size(meshX);
        Z = zeros(row, col);
        for i = 1 : row
            theta = [meshX(i,:); meshY(i,:)];
            Z(i,:) =  1/(2*num)*sum((X*theta-repmat(Y,1,col)).^2);   
        end
    
    end
    
    
    function [Z] = getLoss(X,Y, num,theta)
        Z= 1/(2*num)*sum((X*theta-Y).^2);
    end
    
    function [thetaOut] = GD(X,Y,theta,eta)
        dataSize = length(X);                       % Obtain the number of data
        dx = 1/dataSize.*(X'*(X*theta-Y));          % Obtain the gradient of Loss function
        thetaOut = theta -eta.*dx;                  % Update parameters(theta)
    end
    
    % @ Depscription: 
    %       Mini-batch Gradient Descent (MBGD)
    %       Stochastic Gradient Descent(batchSize = 1) (SGD)
    % @ param:
    %       X - [1 X_] X_ is actual X; Y - actual Y
    %       theta - theta for univariate linear regression y_pred = theta_0 + theta1*x
    %       eta - learning rate;
    %
    function [thetaOut] = MBGD(X,Y,theta, eta,batchSize) 
        dataSize = length(X);           % obtain the number of data 
        k = fix(dataSize/batchSize);    % obtain the number of batch which has absolutely same size: k = batchNum-1;    
        batchIdx = randperm(dataSize);  % randomly sort for every epoch for achiving sample diversity
        
        batchIdx1 = reshape(batchIdx(1:k*batchSize),k,batchSize);   % batches which has absolutely same size
        batchIdx2 = batchIdx(k*batchSize+1:end);                    % ramained batch
        
        for i = 1 : k
            thetaOut = GD(X(batchIdx1(i,:),:),Y(batchIdx1(i,:)),theta,eta);
        end
        if(~isempty(batchIdx2))
            thetaOut = GD(X(batchIdx2,:),Y(batchIdx2),thetaOut,eta);
        end
    end
    

    动量梯度下降(Momentum / Gradient Descent with Momentum)

    动量梯度下降通过指数加权平均(Exponentially weighted moving averages)策略,使用之前梯度信息修正当前梯度,来加速参数学习。这里我们令梯度项
    1mi=1mJ(θ,xi,yi)θ=θJ(θ) \frac{1}{m}\sum_{i=1}^m\frac{\partial J(\theta,x^i,y^i)}{\partial \theta} = \nabla_\theta J(\theta) 则梯度下降可以简化表示为
    θ:=θηθJ(θ) \theta := \theta - \eta\cdot\nabla_\theta J(\theta)
    对于动量梯度下降,初始化动量项m=0m = 0,更新迭代公式如下
    m:=γm+ηθJ(θ) m := \gamma\cdot m + \eta\cdot\nabla_\theta J(\theta) θ:=θm \theta :=\theta - m
    其中mm为动量梯度下降的动量项,且初始化为0;γ\gamma是关于动量项的超参数一般取小于等于0.9。我们可以发现,越是远离当前点的梯度信息,对于当前梯度的贡献越小。

    优点:通过过去梯度信息来优化下降速度,如果当前梯度与之前梯度方向一致时候,收敛速度得到加强,反之则减弱。换句话说,加快收敛同时减小震荡,分别对应图中 θ0\theta_0θ1\theta_1 方向。
    缺点:可能在下坡过程中累计动量太大,冲过极小值点。

    下图直观显示动量梯度下降迭代过程。
    在这里插入图片描述

    这里附上动量梯度下降的相关代码:

    % Writen by weichen GU, data 4/19th/2020
    clear, clf, clc;
    data = linspace(-20,20,100);                    % x range
    col = length(data);                             % Obtain the number of x
    data = [data;0.5*data + wgn(1,100,1)+10];       % Generate dataset - y = 0.5 * x + wgn^2 + 10;
    X = [ones(1, col); data(1,:)]';                 % X ->[1;X];
    
    t1=-40:0.1:50;
    t2=-4:0.1:4;
    [meshX,meshY]=meshgrid(t1,t2);
    meshZ = getTotalCost(X, data(2,:)', col, meshX,meshY);
    
    theta =[-30;-4];        % Initialize parameters
    LRate = [0.1; 0.002]  % Learning rate
    thresh = 0.5;           % Threshold of loss for jumping iteration
    iteration = 50;        % The number of teration
    
    lineX = linspace(-30,30,100);
    [row, col] = size(data)                                     % Obtain the size of dataset
    lineMy = [lineX;theta(1)*lineX+theta(2)];                   % Fitting line
    hLine = plot(lineMy(1,:),lineMy(2,:),'c','linewidth',2);    % draw fitting line
    
    loss = getLoss(X,data(2,:)',col,theta);                     % Obtain current loss value
    
    subplot(2,2,1);
    plot(data(1,:),data(2,:),'r.','MarkerSize',10);
    title('Data fiting using Univariate LR');
    axis([-30,30,-10,30])
    xlabel('x');
    ylabel('y');
    hold on;
    
    % Draw 3d loss surfaces
    subplot(2,2,2)
    mesh(meshX,meshY,meshZ)
    xlabel('θ_0');
    ylabel('θ_1');
    title('3D surfaces for loss')
    hold on;
    scatter3(theta(1),theta(2),loss,'r*');
    
    % Draw loss contour figure
    subplot(2,2,3)
    contour(meshX,meshY,meshZ)
    xlabel('θ_0');
    ylabel('θ_1');
    title('Contour figure for loss')
    hold on;
    plot(theta(1),theta(2),'r*')
    
    % Draw loss with iteration
    subplot(2,2,4)
    hold on;
    title('Loss when using Momentum GD');
    xlabel('iter');
    ylabel('loss');
    plot(0,loss,'b*');
    
    set(gca,'XLim',[0 iteration]);
    
    hold on;
    
    batchSize = 32;
    grad = 0; gamma = 0.5;
    
    for iter = 1 : iteration
        delete(hLine) % set(hLine,'visible','off')
    
        
        %[thetaOut] = GD(X,data(2,:)',theta,LRate); % Gradient Descent algorithm
        %[thetaOut] = MBGD(X,data(2,:)',theta,LRate,20);
        [thetaOut,grad] = MBGDM(X,data(2,:)',theta,LRate,batchSize,grad,gamma);
        subplot(2,2,3);
        line([theta(1),thetaOut(1)],[theta(2),thetaOut(2)],'color','k')
    
        theta = thetaOut;
        loss = getLoss(X,data(2,:)',col,theta); % Obtain losw
        
        
        lineMy(2,:) = theta(2)*lineX+theta(1); % Fitting line
        subplot(2,2,1);
        hLine = plot(lineMy(1,:),lineMy(2,:),'c','linewidth',2); % draw fitting line
        %legend('training data','linear regression');
        
        subplot(2,2,2);
        scatter3(theta(1),theta(2),loss,'r*');
        
        subplot(2,2,3);
        plot(theta(1),theta(2),'r*')
        
        
        subplot(2,2,4)
        plot(iter,loss,'b*');
    
        drawnow();
        
        if(loss < thresh)
            break;
        end
    end
    
    hold off
    
    
    function [Z] = getTotalCost(X,Y, num,meshX,meshY);
        [row,col] = size(meshX);
        Z = zeros(row, col);
        for i = 1 : row
            theta = [meshX(i,:); meshY(i,:)];
            Z(i,:) =  1/(2*num)*sum((X*theta-repmat(Y,1,col)).^2);   
        end
    
    end
    
    
    function [Z] = getLoss(X,Y, num,theta)
        Z= 1/(2*num)*sum((X*theta-Y).^2);
    end
    
    function [thetaOut] = GD(X,Y,theta,eta)
        dataSize = length(X);                       % Obtain the number of data
        dx = 1/dataSize.*(X'*(X*theta-Y));          % Obtain the gradient of Loss function
        thetaOut = theta -eta.*dx;                  % Update parameters(theta)
    end
    % Gradient Descent with Momentum
    function [thetaOut, momentum] = GDM(X,Y,theta,eta,momentum,gamma)
        dataSize = length(X);                       % Obtain the number of data
        dx = 1/dataSize.*(X'*(X*theta-Y));          % Obtain the gradient of Loss function
        momentum = gamma*momentum +eta.*dx;         % Obtain the momentum of gradient
        thetaOut = theta - momentum;                % Update parameters(theta)
    end
    
    % @ Depscription: 
    %       Mini-batch Gradient Descent (MBGD)
    %       Stochastic Gradient Descent(batchSize = 1) (SGD)
    % @ param:
    %       X - [1 X_] X_ is actual X; Y - actual Y
    %       theta - theta for univariate linear regression y_pred = theta_0 + theta1*x
    %       eta - learning rate;
    %
    function [thetaOut] = MBGD(X,Y,theta, eta,batchSize) 
        dataSize = length(X);           % obtain the number of data 
        k = fix(dataSize/batchSize);    % obtain the number of batch which has absolutely same size: k = batchNum-1;    
        batchIdx = randperm(dataSize);  % randomly sort for every epoch for achiving sample diversity
        
        batchIdx1 = reshape(batchIdx(1:k*batchSize),k,batchSize);   % batches which has absolutely same size
        batchIdx2 = batchIdx(k*batchSize+1:end);                    % ramained batch
        
        for i = 1 : k
            thetaOut = GD(X(batchIdx1(i,:),:),Y(batchIdx1(i,:)),theta,eta);
        end
        if(~isempty(batchIdx2))
            thetaOut = GD(X(batchIdx2,:),Y(batchIdx2),thetaOut,eta);
        end
    end
    
    function [thetaOut,grad] = MBGDM(X,Y,theta, eta,batchSize,grad,gamma) 
        dataSize = length(X);           % obtain the number of data 
        k = fix(dataSize/batchSize);    % obtain the number of batch which has absolutely same size: k = batchNum-1;    
        batchIdx = randperm(dataSize);  % randomly sort for every epoch for achiving sample diversity
        
        batchIdx1 = reshape(batchIdx(1:k*batchSize),k,batchSize);   % batches which has absolutely same size
        batchIdx2 = batchIdx(k*batchSize+1:end);                    % ramained batch
        
        for i = 1 : k
            [thetaOut,grad] = GDM(X(batchIdx1(i,:),:),Y(batchIdx1(i,:)),theta,eta,grad,gamma);
            
        end
        if(~isempty(batchIdx2))
            [thetaOut,grad] = GDM(X(batchIdx2,:),Y(batchIdx2),thetaOut,eta,grad,gamma);
        end
    end
    
    

    Nesterov加速梯度下降法(NAG / Nesterov Accelerated Gradient)

    Nesterov加速梯度下降相比于动量梯度下降的区别是,通过使用未来梯度来更新动量。即将下一次的预测梯度θJ(θηm)\nabla_\theta J(\theta-\eta\cdot m)考虑进来。其更新公式如下:
    m:=γm+ηθJ(θγm) m :=\gamma\cdot m+\eta\cdot\nabla_\theta J(\theta-\gamma\cdot m) θ:=θm \theta := \theta - m
    对于公式理解如下图所示,

    1. 对于起始点应用动量梯度下降,首先求得在该点处的带权梯度η1\eta\cdot\nabla_1,通过与之前动量γm\gamma\cdot m矢量加权,求得下一次的θ\theta位置。
    2. 对于起始点应用Nesterov加速梯度下降,首先通过先前动量求得θ\theta的预测位置,再加上预测位置的带权梯度γm\gamma\cdot m

    在这里插入图片描述
    优点

    1. 相对于动量梯度下降法,因为NAG考虑到了未来预测梯度,收敛速度更快(如上图)。
    2. 当更新幅度很大时,NAG可以抑制震荡。例如起始点在最优点的左侧γm\gamma m对应的值在最优点的右侧,对于动量梯度而言,叠加η1\eta\nabla_1使得迭代后的点更加远离最优点→→。而NAG首先跳到γm\gamma m对应的值,计算梯度为正,再叠加反方向的η2\eta\nabla_2,从而达到抑制震荡的目的。

    下图直观显示了Nesterov加速梯度下降的迭代过程:

    在这里插入图片描述

    下面是Nesterov加速梯度下降的相关代码:

    
    % Writen by weichen GU, data 4/19th/2020
    clear, clf, clc;
    data = linspace(-20,20,100);                    % x range
    col = length(data);                             % Obtain the number of x
    data = [data;0.5*data + wgn(1,100,1)+10];       % Generate dataset - y = 0.5 * x + wgn^2 + 10;
    X = [ones(1, col); data(1,:)]';                 % X ->[1;X];
    
    t1=-40:0.1:50;
    t2=-4:0.1:4;
    [meshX,meshY]=meshgrid(t1,t2);
    meshZ = getTotalCost(X, data(2,:)', col, meshX,meshY);
    
    theta =[-30;-4];        % Initialize parameters
    LRate = [0.1; 0.002]  % Learning rate
    thresh = 0.5;           % Threshold of loss for jumping iteration
    iteration = 50;        % The number of teration
    
    lineX = linspace(-30,30,100);
    [row, col] = size(data)                                     % Obtain the size of dataset
    lineMy = [lineX;theta(1)*lineX+theta(2)];                   % Fitting line
    hLine = plot(lineMy(1,:),lineMy(2,:),'c','linewidth',2);    % draw fitting line
    
    loss = getLoss(X,data(2,:)',col,theta);                     % Obtain current loss value
    
    subplot(2,2,1);
    plot(data(1,:),data(2,:),'r.','MarkerSize',10);
    title('Data fiting using Univariate LR');
    axis([-30,30,-10,30])
    xlabel('x');
    ylabel('y');
    hold on;
    
    % Draw 3d loss surfaces
    subplot(2,2,2)
    mesh(meshX,meshY,meshZ)
    xlabel('θ_0');
    ylabel('θ_1');
    title('3D surfaces for loss')
    hold on;
    scatter3(theta(1),theta(2),loss,'r*');
    
    % Draw loss contour figure
    subplot(2,2,3)
    contour(meshX,meshY,meshZ)
    xlabel('θ_0');
    ylabel('θ_1');
    title('Contour figure for loss')
    hold on;
    plot(theta(1),theta(2),'r*')
    
    % Draw loss with iteration
    subplot(2,2,4)
    hold on;
    title('Loss when using NAG');
    xlabel('iter');
    ylabel('loss');
    plot(0,loss,'b*');
    
    set(gca,'XLim',[0 iteration]);
    
    hold on;
    
    batchSize = 32;
    grad = 0; gamma = 0.5;
    
    for iter = 1 : iteration
        delete(hLine) % set(hLine,'visible','off')
    
        
        %[thetaOut] = GD(X,data(2,:)',theta,LRate); % Gradient Descent algorithm
        %[thetaOut] = MBGD(X,data(2,:)',theta,LRate,20);
        [thetaOut,grad] = MBGDM(X,data(2,:)',theta,LRate,batchSize,grad,gamma);
        subplot(2,2,3);
        line([theta(1),thetaOut(1)],[theta(2),thetaOut(2)],'color','k')
    
        theta = thetaOut;
        loss = getLoss(X,data(2,:)',col,theta); % Obtain losw
        
        
        lineMy(2,:) = theta(2)*lineX+theta(1); % Fitting line
        subplot(2,2,1);
        hLine = plot(lineMy(1,:),lineMy(2,:),'c','linewidth',2); % draw fitting line
        %legend('training data','linear regression');
        
        subplot(2,2,2);
        scatter3(theta(1),theta(2),loss,'r*');
        
        subplot(2,2,3);
        plot(theta(1),theta(2),'r*')
        
        
        subplot(2,2,4)
        plot(iter,loss,'b*');
    
        drawnow();
        
        if(loss < thresh)
            break;
        end
    end
    
    hold off
    
    
    function [Z] = getTotalCost(X,Y, num,meshX,meshY);
        [row,col] = size(meshX);
        Z = zeros(row, col);
        for i = 1 : row
            theta = [meshX(i,:); meshY(i,:)];
            Z(i,:) =  1/(2*num)*sum((X*theta-repmat(Y,1,col)).^2);   
        end
    
    end
    
    
    function [Z] = getLoss(X,Y, num,theta)
        Z= 1/(2*num)*sum((X*theta-Y).^2);
    end
    
    function [thetaOut] = GD(X,Y,theta,eta)
        dataSize = length(X);                       % Obtain the number of data
        dx = 1/dataSize.*(X'*(X*theta-Y));          % Obtain the gradient of Loss function
        thetaOut = theta -eta.*dx;                  % Update parameters(theta)
    end
    % Gradient Descent with Momentum
    function [thetaOut, momentum] = GDM(X,Y,theta,eta,momentum,gamma)
        dataSize = length(X);                       % Obtain the number of data
        dx = 1/dataSize.*(X'*(X*theta-Y));          % Obtain the gradient of Loss function
        momentum = gamma*momentum +eta.*dx;         % Obtain the momentum of gradient
        thetaOut = theta - momentum;                % Update parameters(theta)
    end
    
    % @ Depscription: 
    %       Mini-batch Gradient Descent (MBGD)
    %       Stochastic Gradient Descent(batchSize = 1) (SGD)
    % @ param:
    %       X - [1 X_] X_ is actual X; Y - actual Y
    %       theta - theta for univariate linear regression y_pred = theta_0 + theta1*x
    %       eta - learning rate;
    %
    function [thetaOut] = MBGD(X,Y,theta, eta,batchSize) 
        dataSize = length(X);           % obtain the number of data 
        k = fix(dataSize/batchSize);    % obtain the number of batch which has absolutely same size: k = batchNum-1;    
        batchIdx = randperm(dataSize);  % randomly sort for every epoch for achiving sample diversity
        
        batchIdx1 = reshape(batchIdx(1:k*batchSize),k,batchSize);   % batches which has absolutely same size
        batchIdx2 = batchIdx(k*batchSize+1:end);                    % ramained batch
        
        for i = 1 : k
            thetaOut = GD(X(batchIdx1(i,:),:),Y(batchIdx1(i,:)),theta,eta);
        end
        if(~isempty(batchIdx2))
            thetaOut = GD(X(batchIdx2,:),Y(batchIdx2),thetaOut,eta);
        end
    end
    
    % Nesterov Accelerated Gradient (NAG)
    function [thetaOut, momentum] = NAG(X,Y,theta,eta,momentum,gamma)
        dataSize = length(X);                                   % Obtain the number of data
        dx = 1/dataSize.*(X'*(X*(theta- gamma*momentum)-Y));    % Obtain the gradient of Loss function
        momentum = gamma*momentum +eta.*dx;                     % Obtain the momentum of gradient
        thetaOut = theta - momentum;                            % Update parameters(theta)
    end
    
    function [thetaOut,grad] = MBGDM(X,Y,theta, eta,batchSize,grad,gamma)
        dataSize = length(X);           % obtain the number of data 
        k = fix(dataSize/batchSize);    % obtain the number of batch which has absolutely same size: k = batchNum-1;    
        batchIdx = randperm(dataSize);  % randomly sort for every epoch for achiving sample diversity
        
        batchIdx1 = reshape(batchIdx(1:k*batchSize),k,batchSize);   % batches which has absolutely same size
        batchIdx2 = batchIdx(k*batchSize+1:end);                    % ramained batch
        
        for i = 1 : k
            %[thetaOut,grad] = GDM(X(batchIdx1(i,:),:),Y(batchIdx1(i,:)),theta,eta,grad,gamma);
            [thetaOut,grad] = NAG(X(batchIdx1(i,:),:),Y(batchIdx1(i,:)),theta,eta,grad,gamma);
        end
        if(~isempty(batchIdx2))
            %[thetaOut,grad] = GDM(X(batchIdx2,:),Y(batchIdx2),thetaOut,eta,grad,gamma);
            [thetaOut,grad] = NAG(X(batchIdx2,:),Y(batchIdx2),thetaOut,eta,grad,gamma);
        end
        
    end
    

    我们将θ1\theta_1对应的学习率η\eta升高到0.005后,得到以下另一组数据方便对比。η=[0.1,0.005]

    批梯度下降随机梯度下降的迭代过程:
    在这里插入图片描述 在这里插入图片描述
    小批量梯度下降动量梯度下降的迭代过程:
    在这里插入图片描述 在这里插入图片描述
    Nesterov加速梯度下降的迭代过程:
    在这里插入图片描述

    后语

    下一篇我们将介绍几种常见的学习率自适应梯度下降算法:AdaGrad、RMSProp、AdaDelta、Adam和Nadam。

    展开全文
  • 梯度下降法 matlab

    2015-11-13 16:14:27
    rand('state',0);randn('state',0); n=50;N=1000;x=linspace(-3,3,n)';X=linspace(-3,3,N)';... 梯度下降法的收敛速度强烈依赖于梯度下降的步幅,即e和收敛结果的判别方法 norm(t-t0)。
    rand('state',0);randn('state',0);
    n=50;N=1000;x=linspace(-3,3,n)';X=linspace(-3,3,N)';
    pix=pi*x;y=sin(pix)./(pix)+0.1*x+0.05*randn(n,1);
    hh=2*0.3^2;t0=randn(n,1);e=0.1;
    for o=1:n*1000
        i=ceil(rand*n);
        ki=exp(-(x-x(i)).^2/hh);t=t0-e*ki*(ki'*t0-y(i));
        if norm(t-t0)<0.000001,break,end
        t0=t;
    end
    K=exp(-(repmat(X.^2,1,n)+repmat(x.^2',N,1)-2*X*x')/hh);
    F=K*t;
    figure(1);clf;hold on;axis([-2.8 2.8 -0.5 1.2]);

    plot(X,F,'g+');plot(x,y,'bo');

     梯度下降法的收敛速度强烈依赖于梯度下降的步幅,即e和收敛结果的判别方法 norm(t-t0)<0.000001。




    展开全文
  • 梯度下降代码:function [ theta, J_history ] = GradinentDecent( X, y, theta, alpha, num_iter )m = length(y);J_history = zeros(20, 1);i = 0;temp = 0;for iter = 1:num_itertemp = temp +1;theta = theta - ...

    梯度下降代码:function [ theta, J_history ] = GradinentDecent( X, y, theta, alpha, num_iter )

    m = length(y);

    J_history = zeros(20, 1);

    i = 0;

    temp = 0;

    for iter = 1:num_iter

    temp = temp +1;

    theta = theta - alpha / m * X‘ * (X*theta - y);

    if temp>=100

    temp = 0;

    i = i + 1;

    J_history(i) = ComputeCost(X, y, theta);

    end

    end

    end

    随机梯度下降代码:function [ theta,J_history ] = StochasticGD( X, y, theta, alpha, num_iter )

    m = length(y);

    J_history = zeros(20, 1);

    temp = 0;

    n = 0;

    for iter = 1:num_iter

    temp = temp + 1;

    index = randi(m);

    theta = theta -alpha *  (X(index, :) * theta - y(index)) * X(index, :)‘;

    if temp>=100

    temp = 0;

    n = n + 1;

    J_history(n) = ComputeCost(X, y, theta);

    end

    end

    end

    方差减小的梯度下降(SVRG):function [ theta_old, J_history ] = SVRG( X, y, theta, alpha )

    theta_old = theta;

    n = length(y);

    J_history = zeros(20,1);

    m = 2 * n;

    for i = 1:20

    theta_ = theta_old;

    Mu = 1/n *  X‘ * (X*theta_ - y);

    theta_0 = theta_;

    for j = 1:m

    index = randi(n);

    GD_one = (X(index, :) * theta_0 - y(index)) * X(index, :)‘;

    GD_ = (X(index, :) * theta_ - y(index)) * X(index, :)‘;

    theta_t = theta_0 - alpha * (GD_one - GD_ + Mu);

    theta_0 = theta_t;

    end

    J_history(i) = ComputeCost(X, y, theta_t);

    theta_old = theta_t;

    end

    end

    损失函数:function J = ComputeCost( X, y, theta )

    m = length(y);

    J = sum((X*theta - y).^2) / (2*m);

    end

    主程序代码:%% clean workspace

    clc;

    clear;

    close all;

    %% plot data

    fprintf(‘plot data... \n‘);

    X = load(‘ex2x.dat‘);

    y = load(‘ex2y.dat‘);

    m = length(y);

    figure;

    plot(X,y,‘o‘);

    %% gradient decent

    fprintf(‘Runing gradient decent... \n‘);

    X = [ones(m,1),X];

    theta_SGD = zeros(2, 1);

    theta_GD = zeros(2, 1);

    theta_SVRG = zeros(2, 1);

    Iteration = 2000;

    alpha = 0.015;

    alpha1 = 0.025;

    [theta ,J]= StochasticGD(X, y, theta_SGD, alpha, Iteration);

    [theta1 ,J1]= GradinentDecent(X, y, theta_GD, alpha, Iteration);

    [theta2 ,J2]= SVRG(X, y, theta_SVRG, alpha1);

    fprintf(‘SGD: %f %f\n‘,theta(1),theta(2));

    fprintf(‘GD: %f %f\n‘,theta1(1),theta1(2));

    fprintf(‘SVRG: %f %f\n‘,theta2(1),theta2(2));

    hold on;

    plot(X(:, 2), X*theta, ‘r-‘);

    plot(X(:, 2), X*theta1, ‘g-‘);

    plot(X(:, 2), X*theta2, ‘b-‘);

    legend(‘‘,‘SGD‘,‘GD‘,‘SVRG‘);

    x_j = 1:1:20;

    figure;

    hold on;

    plot(x_j, J, ‘b-‘);

    plot(x_j, J1, ‘g-‘);

    plot(x_j, J2, ‘r-‘);

    legend(‘SGD‘,‘GD‘,‘SVRG‘);

    xlabel(‘epoch‘)

    ylabel(‘loss‘)

    实验结果:

    20181225150434766203.png20181225150435089457.png

    原文:https://www.cnblogs.com/ryluo/p/10173822.html

    展开全文
  • 共轭梯度法MATLAB程序%conjugate gradient methods%method:FR,PRP,HS,DY,CD,WYL,LS%精确线搜索,梯度终止准则function [ m,k,d,a,X,g1,fv] conjgradme G,b,c,X,e,methodif nargin 6 error '输入参数必须为6' ;...
  • 随机梯度下降算法 matlab

    千次阅读 2015-09-29 22:09:19
    x=[1 1;1 2;1 3; 1 4]; y=[1.1;2.2;2.7;3.8]; rate=0.001; w = zeros(1,size(x,2)); iter = 100; while(iter >0) for i=1:size(x,1) for j=1:size(w,2) w(j)=w(j)+rate*(y(i)-w(1,:)*x(i,:)')*x
  • 大多数数据科学算法是优化问题。而这方面最常使用的算法梯度下降。 或许梯度下降听起来很玄,但读完这篇文章之后,你对它的感觉大概会改变。 这里用住宅价格预测问题作为例子,并附有Matlab程序源文件供大家学习
  • matlab实现最速下降法,牛顿法和共轭梯度法求解实例 (5页) 本资源提供全文预览,点击全文预览即可全文预览,如果喜欢文档就下载吧,查找使用更方便哦!19.90 积分实验的题目和要求 1、所属课程名称:最优化方法2、...
  • 梯度下降算法 matlab

    2015-09-29 22:29:16
    x=[1 1;1 2;1 3; 1 4]; y=[1.1;2.2;2.7;3.8]; rate=0.01; w = zeros(1,size(x,2)); iter = 50; while(iter >0) for i=1:size(x,1) t=sum(y-x*w'); for j=1:size(w,2) w(j)=w(j)+rate*t*x(i,j);... e
  • 改动源数据地址即可运行
  • 包括单特征的样本的最小二乘法计算, 单特征样本的梯度下降法--代数版本 多特征样本的梯度下降--矩阵运算表示。 在矩阵表示的梯度下降法中运用标准差归一化(可选择注释)。 有比较详细的注释
  • 随机梯度下降算法SDG的MATLAB实现,数据集可到UCI数据库里下载
  • 最速梯度下降法,有详细的注释matlab程序 最速梯度下降法,有详细的注释matlab程序 最速梯度下降法,有详细的注释matlab程序 最速梯度下降法,有详细的注释matlab程序
  • 梯度下降法以及MATLAB相关资料;具体过程请参考我的博客《逻辑与思考系列[1/300]: 梯度下降法matlab实践》
  • GradDescent:多元线性回归的梯度下降算法MATLAB实现

空空如也

空空如也

1 2 3 4 5 ... 20
收藏数 8,355
精华内容 3,342
关键字:

梯度下降法matlab

matlab 订阅