from numpy import *
m = 20
X0 = ones((m, 1))
X1 = arange(1, m+1).reshape(m, 1)
X = hstack((X0, X1))
y = np.array([
3, 4, 5, 5, 2, 4, 7, 8, 11, 8, 12,
11, 13, 13, 16, 17, 18, 17, 19, 21
]).reshape(m, 1)
alpha = 0.01
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
接下来我们以矩阵向量的形式定义代价函数和代价函数的梯度
def cost_function(theta, X, Y):
diff = dot(X, theta) - Y
return (1/(2*m)) * dot(diff.transpose(), diff)
def gradient_function(theta, X, Y):
diff = dot(X, theta) - Y
return (1/m) * dot(X.transpose(), diff)
最后就是算法的核心部分,梯度下降迭代计算
def gradient_descent(X, Y, alpha):
theta = array([1, 1]).reshape(2, 1)
gradient = gradient_function(theta, X, Y)
while not all(abs(gradient) <= 1e-5):
theta = theta - alpha * gradient
gradient = gradient_function(theta, X, Y)
return theta
optimal = gradient_descent(X, Y, alpha)
print(‘optimal:’, optimal)
print(‘cost function:’, cost_function(optimal, X, Y)[0][0])
当梯度小于1e-5时,说明已经进入了比较平滑的状态,类似于山谷的状态,这时候再继续迭代效果也不大了,所以这个时候可以退出循环!
运行代码,计算得到的结果如下:
print('optimal:', optimal)
print('cost function:', cost_function(optimal, X, Y)[0][0])
通过matplotlib画出图像,
def plot(X, Y, theta):
import matplotlib.pyplot as plt
ax = plt.subplot(111)
ax.scatter(X, Y, s=30, c="red", marker="s")
plt.xlabel("X")
plt.ylabel("Y")
x = arange(0, 21, 0.2)
y = theta[0] + theta[1]*x
ax.plot(x, y)
plt.show()
plot(X1, Y, optimal)
所拟合出的直线如下

全部代码如下,大家有兴趣的可以复制下来跑一下看一下结果:
from numpy import *
m = 20
X0 = ones((m, 1))
X1 = arange(1, m+1).reshape(m, 1)
X = hstack((X0, X1))
Y = array([
3, 4, 5, 5, 2, 4, 7, 8, 11, 8, 12,
11, 13, 13, 16, 17, 18, 17, 19, 21
]).reshape(m, 1)
alpha = 0.01
def cost_function(theta, X, Y):
diff = dot(X, theta) - Y
return (1/(2m)) dot(diff.transpose(), diff)
def gradient_function(theta, X, Y):
diff = dot(X, theta) - Y
return (1/m) * dot(X.transpose(), diff)
def gradient_descent(X, Y, alpha):
theta = array([1, 1]).reshape(2, 1)
gradient = gradient_function(theta, X, Y)
while not all(abs(gradient) <= 1e-5):
theta = theta - alpha * gradient
gradient = gradient_function(theta, X, Y)
return theta
optimal = gradient_descent(X, Y, alpha)
print(‘optimal:’, optimal)
print(‘cost function:’, cost_function(optimal, X, Y)[0][0])
def plot(X, Y, theta):
import matplotlib.pyplot as plt
ax = plt.subplot(111)
ax.scatter(X, Y, s=30, c=“red”, marker=“s”)
plt.xlabel(“X”)
plt.ylabel(“Y”)
x = arange(0, 21, 0.2)
y = theta[0] + theta[1]*x
ax.plot(x, y)
plt.show()
plot(X1, Y, optimal)
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
5. 小结
至此,就基本介绍完了梯度下降法的基本思想和算法流程,并且用python实现了一个简单的梯度下降算法拟合直线的案例!
最后,我们回到文章开头所提出的场景假设:
这个下山的人实际上就代表了反向传播算法,下山的路径其实就代表着算法中一直在寻找的参数Θ,山上当前点的最陡峭的方向实际上就是代价函数在这一点的梯度方向,场景中观测最陡峭方向所用的工具就是微分 。在下一次观测之前的时间就是有我们算法中的学习率α所定义的。
可以看到场景假设和梯度下降算法很好的完成了对应!
本文大部分内容来自一位前辈,非常感谢分享!谢谢!