前言
最近开始看Andrew Ng 大牛的深度学习教程,算是作为对自己的一个激励,也作为日后回顾的办法,开始记录学习笔记,每一章节分别对应,所有章节写在这一片文章里便于查询。所以我会不断更新滴~
线性回归
本章大致讲解了线性分类器的原理(他假设我们已经有这些基础了,只是作为复习梯度下降的一个办法,其实能看这些教程的都应该有机器学习的基础知识,所以有好多基础知识我就直接省略不写啦),然后练习是实现目标函数以及所有参数对应的梯度的计算,我的代码如下:
function [f,g] = linear_regression(theta, X,y)
%
% Arguments:
% theta - A vector containing the parameter values to optimize.
% X - The examples stored in a matrix.
% X(i,j) is the i'th coordinate of the j'th example.
% y - The target value for each example. y(j) is the target for example j.
%
m=size(X,2);%样本数量
n=size(X,1);%特征维度
f=0;
g=zeros(size(theta));
%
% TODO: Compute the linear regression objective by looping over the examples in X.
% Store the objective function value in 'f'.
%
% TODO: Compute the gradient of the objective with respect to theta by looping over
% the examples in X and adding up the gradient for each example. Store the
% computed gradient in 'g'.
%%% YOUR CODE HERE %%%
for j = 1:m
f = f + 0.5*(theta'*X(:,j)-y(j))^2;
end
% ----------
for i = 1:n
for j = 1:m
g(i) = g(i) + X(i,j)*(theta'*X(:,j)-y(j))
end
end
最终结果如下:
Optimization took 128.640734 seconds.%花这么多时间是因为我把循环里的参数打出来了
RMS training error: 4.843147
RMS testing error: 4.151706
Logistics回归
说是回归,其实是分类,本章节主要实现了一个手写字符分类,而且是最简单的0-1分类,所以结果正确率相当之高。我的代码如下:
function [f,g] = logistic_regression(theta, X,y)
%
% Arguments:
% theta - A column vector containing the parameter values to optimize.
% X - The examples stored in a matrix.
% X(i,j) is the i'th coordinate of the j'th example.
% y - The label for each example. y(j) is the j'th example's label.
%
m=size(X,2);%训练图片数量
n=size(X,1);%图片像素点数+1
% initialize objective value and gradient.
f = 0;
g = zeros(size(theta));
%
% TODO: Compute the objective function by looping over the dataset and summing
% up the objective values for each example. Store the result in 'f'.
%
% TODO: Compute the gradient of the objective by looping over the dataset and summing
% up the gradients (df/dtheta) for each example. Store the result in 'g'.
%
%%% YOUR CODE HERE %%%
for j = 1:m
f = f - ( y(j)*log(1/(1+exp(-theta'*X(:,j)))) + (1-y(j))*log(1-(1/(1+exp(-theta'*X(:,j))))) );
end
% ----------
for i = 1:n
for j = 1:m
g(i) = g(i) + X(i,j)*(1/(1+exp(-theta'*X(:,j)))-y(j));
end
end
结果:
Optimization took 7874.049756 seconds.%我等到花儿都谢了
Training accuracy: 100.0%
Test accuracy: 100.0%
向量化
向量化是节约时间的一大法宝,说白了就是利用matlab矩阵计算的优势弥补它在循环上的劣势。我的线性回归代码:
function [f,g] = linear_regression_vec(theta, X,y)
%
% Arguments:
% theta - A vector containing the parameter values to optimize.
% X - The examples stored in a matrix.
% X(i,j) is the i'th coordinate of the j'th example.
% y - The target value for each example. y(j) is the target for example j.
%
m=size(X,2);
% initialize objective value and gradient.
f = 0;
g = zeros(size(theta));
%
% TODO: Compute the linear regression objective function and gradient
% using vectorized code. (It will be just a few lines of code!)
% Store the objective function value in 'f', and the gradient in 'g'.
%
%%% YOUR CODE HERE %%%
f = sum((theta'*X - y).^2) * 0.5;
y_hat = theta'*X;
g = X*(y_hat' - y');
结果:
Optimization took 0.108650 seconds.
RMS training error: 4.650101
RMS testing error: 4.856230
真是非常省时省力哈。不过这些i,j下标,还有转置真是让人头晕,实际写的时候可以用调试模式来观察你的数据,然后修改你的小标,决定是否转置(目的不都是为了矩阵符合相乘的条件嘛)。还有在一次试验中尽量记住每一个常用变量的含义,比如在整篇教程中,m 代表样本数量,n 代表特征维度。
下面是Logistic 回归的向量化代码:
function [f,g] = logistic_regression_vec(theta, X,y)
%
% Arguments:
% theta - A column vector containing the parameter values to optimize.
% X - The examples stored in a matrix.
% X(i,j) is the i'th coordinate of the j'th example.
% y - The label for each example. y(j) is the j'th example's label.
%
m=size(X,2);
% initialize objective value and gradient.
f = 0;
g = zeros(size(theta));
%
% TODO: Compute the logistic regression objective function and gradient
% using vectorized code. (It will be just a few lines of code!)
% Store the objective function value in 'f', and the gradient in 'g'.
%
%%% YOUR CODE HERE %%%
h = sigmoid(theta'*X);
f = -sum(y.*log(h) + (1-y).*log(1 - h));
g = X*(h - y)';
结果:
Optimization took 3.064685 seconds.
Training accuracy: 100.0%
Test accuracy: 100.0%
梯度验证
简单说来就是用求导的近似值去验证我们按照公式计算的导数值是否正确。
我们使用grad_check.m:
function average_error = grad_check(fun, theta0, num_checks, varargin)
delta=1e-3;
sum_error=0;
fprintf(' Iter i err');
fprintf(' g_est g f\n')
for i=1:num_checks
T = theta0;
j = randsample(numel(T),1);%theta选择一个随机下标
T0=T; T0(j) = T0(j)-delta;%θ(j-),亦即θ的第j个元素减去delta
T1=T; T1(j) = T1(j)+delta;%θ(j+)
[f,g] = fun(T, varargin{:});
f0 = fun(T0, varargin{:});%J(θ(j-))
f1 = fun(T1, varargin{:});%J(θ(j+))
g_est = (f1-f0) / (2*delta);
error = abs(g(j) - g_est);
%循环次数,theta下标,偏差绝对值,真实值,估计值,函数值
fprintf('% 5d % 6d % 15g % 15f % 15f % 15f\n', ...
i,j,error,g(j),g_est,f);
sum_error = sum_error + error;
end
average_error=sum_error/num_checks;
在ex1a_linreg.m中加入;
average_error = grad_check(@linear_regression_vec,theta,30,train.X,train.y);
fprintf('The Average error is :%f\n',average_error);
运行结果:
Iter i err g_est g f
1 14 8.0571e-06 1418640.687110 1418640.687102 14517559.734147
2 3 3.73228e-07 1100385.922200 1100385.922200 14517559.734147
3 4 2.48384e-06 1236106.996470 1236106.996473 14517559.734147
4 13 5.16325e-06 38562142.957593 38562142.957588 14517559.734147
5 14 8.0571e-06 1418640.687110 1418640.687102 14517559.734147
6 10 6.0685e-06 1118680.054414 1118680.054408 14517559.734147
7 13 5.16325e-06 38562142.957593 38562142.957588 14517559.734147
8 10 6.0685e-06 1118680.054414 1118680.054408 14517559.734147
9 14 8.0571e-06 1418640.687110 1418640.687102 14517559.734147
10 11 2.87592e-06 45661592.041328 45661592.041331 14517559.734147
11 13 5.16325e-06 38562142.957593 38562142.957588 14517559.734147
12 2 1.97807e-06 436767.013214 436767.013212 14517559.734147
13 14 8.0571e-06 1418640.687110 1418640.687102 14517559.734147
14 14 8.0571e-06 1418640.687110 1418640.687102 14517559.734147
15 11 2.87592e-06 45661592.041328 45661592.041331 14517559.734147
16 1 3.02999e-06 106041.865458 106041.865461 14517559.734147
17 5 1.42339e-06 6344.599333 6344.599332 14517559.734147
18 9 3.8307e-06 389421.210472 389421.210468 14517559.734147
19 7 3.66173e-06 660532.159808 660532.159812 14517559.734147
20 5 1.42339e-06 6344.599333 6344.599332 14517559.734147
21 4 2.48384e-06 1236106.996470 1236106.996473 14517559.734147
22 9 3.8307e-06 389421.210472 389421.210468 14517559.734147
23 7 3.66173e-06 660532.159808 660532.159812 14517559.734147
24 11 2.87592e-06 45661592.041328 45661592.041331 14517559.734147
25 11 2.87592e-06 45661592.041328 45661592.041331 14517559.734147
26 12 2.83984e-06 1978417.905024 1978417.905027 14517559.734147
27 5 1.42339e-06 6344.599333 6344.599332 14517559.734147
28 12 2.83984e-06 1978417.905024 1978417.905027 14517559.734147
29 5 1.42339e-06 6344.599333 6344.599332 14517559.734147
30 10 6.0685e-06 1118680.054414 1118680.054408 14517559.734147
The Average error is :0.000004
可见我们的梯度计算是正确的。(其实这个代码还是可优化的哈,循环里有几行可以提到循环外面去,比如
T = theta0;
[f,g] = fun(T, varargin{:});
SoftMax 回归
其实就是多类别的Logistics回归(区分于二分类),我的代码如下:
function [f,g] = softmax_regression_vec(theta, X,y)
%
% Arguments:
% theta - A vector containing the parameter values to optimize.
% In minFunc, theta is reshaped to a long vector. So we need to
% resize it to an n-by-(num_classes-1) matrix.
% Recall that we assume theta(:,num_classes) = 0.
%
% X - The examples stored in a matrix.
% X(i,j) is the i'th coordinate of the j'th example.
% y - The label for each example. y(j) is the j'th example's label.
%
m=size(X,2);%样本数量
n=size(X,1);%特征维度
% theta is a vector; need to reshape to n x num_classes.
theta=reshape(theta, n, []);
num_classes=size(theta,2)+1;
% initialize objective value and gradient.
f = 0;
g = zeros(size(theta));
%
% TODO: Compute the softmax objective function and gradient using vectorized code.
% Store the objective function value in 'f', and the gradient in 'g'.
% Before returning g, make sure you form it back into a vector with g=g(:);
%
%%% YOUR CODE HERE %%%
indictor = full(sparse(y, 1:m, 1));%示性函数
theta = [theta,zeros(n,1)]; %恢复theta,增加一行
a = exp(theta'*X);
p = bsxfun(@rdivide,a,sum(a));
l = log(p);
%f = -sum(indictor*log(p);%这样的话产生过大的矩阵,不允许
f = -indictor(:)'*l(:);
g = -X * (indictor-p)';
g = g(:,1:end- 1); %减去一行
g=g(:); % make gradient a vector for minFunc
结果:
Optimization took 91.072469 seconds.
Training accuracy: 94.4%
Test accuracy: 92.2%
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。