What is a (stochastic) gradient descent ?

Gradient Descent

Screen Shot 2018-03-01 at 2.53.05 AM.png

We can apply different learning rates, which is the pace of adjustment to the weights. If f is a multi-variable function, we use the gradient of f instead of f'. We are subtracting the gradient at each iteration, because we want to move away from the gradient, toward the minimum. 

STAT 4400: Statistical Machine Learning; Columbia University

Stochastic Gradient Descent

Gradient descent has to scan through the entire training set before taking a single step—a costly operation if m is large—stochastic gradient descent can start making progress right away. We repeatedly run through the training set, and each time we encounter a training example, we update the parameters according to the gradient of the error with respect to that single training example only. Often, stochastic gradient descent gets θ “close” to the minimum much faster than gradient descent.

CS 229: Machine Learning; Stanford University