Note: In case the math format does not show up, please add Math Anywhere through the link to your browser.

Gradient Descent is the most important method we use to find the local minimum of a differentiable function. But it always come into problem when we deal with a huge dataset, because gradient descent takes all the sample data in the training set and do a single update for a parameter in an iterative manner to minimize the loss function. It is less efficient as getting parameters updated, we need to run the whole data set. Here comes the idea…

Yuchenhua