Why derivative of a function is used to calculate Local Minimum instead of the actual function?

Question

In Machine learning regression problem, why the local minimum is computed for a derivative function instead of the actual function?

Example: http://en.wikipedia.org/wiki/Gradient_descent

The gradient descent algorithm is applied to find a local minimum of the function $$

f(x)=x^4−3x^3+2, ----(A)

with derivative

f'(x)=4x^3−9x^2. ----(B)

Here to find the local minimum using gradient descent algorithm for the function (A) they have used the derivative function of (A) which is function (B).

Ben Allison · Accepted Answer · 2013-02-12 11:11:13Z

3

The reason is that because the function is concave (or convex if you're doing maximisation---these problems are equivalent), you know that there is a single minimum (maximum). This means that there is a single point where the gradient is equal to zero. There are techniques that use the function itself, but if you can compute the gradient, you can converge much faster because you can think of the gradient giving you information about how far you are from the optimum.

As well as Gradient Descent, there's an optimisation method known as Newton's method, which requires computation of the second derivative (the Hessian in multi-variate optimisation). This converges even faster still, but requires you to be able to invert the Hessian which is not feasible if you have a lot of parameters. So there are methods to get around this which compute a limited memory approximation of the Hessian. These methods converge faster still because they're using information about the curvature of the gradient: it's a simple tradeoff, where the more you know about the function you're trying to optimise, the faster you can find the solution.

answered Feb 12, 2013 at 11:11

Ben Allison

7,4441 gold badge18 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user703555 Over a year ago

Sorry, i have a very silly question. f'(x)=4x^3−9x^2 Just by looking at the function we can find the local minimum, i.e. f'(x) = 0 will occur for x=0 then why we need a gradient descent.

Ben Allison Over a year ago

Well, it's not quite just by looking at it: you can solve for x when f'(x) = 0. This is a simple example: most of the time when you use optimisation methods, you can't get an analytic solution. Take a look at en.wikipedia.org/wiki/Convex_optimization for more details.

Anton Over a year ago

@BenAllison So it is because of computational cost? I mean if we have a loss function f(x) we want to find that x that makes it minimum. But this mean that we have to check out all possible values of x. Instead we compute gradient at some random initial x and move towards the minimum (according to gradient) saving computational time?

xhudik · Accepted Answer · 2013-02-12 10:21:56Z

2

I'm not a mathematician - so I can't give you exact answer, however, you need to understand what derivation does, e.g.:

http://en.wikipedia.org/wiki/Derivative http://en.wikipedia.org/wiki/Differential_of_a_function

this is what you need(what differentiation do): http://en.wikipedia.org/wiki/File:Graph_of_sliding_derivative_line.gif

The derivative at a point equals the slope of the tangent line to the graph of the function at that point. And this is exactly what you want when you are looking a descent. Take it as very informal point of view, wikipedia articles will give you much deeper and precise knowledge ...

answered Feb 12, 2013 at 10:21

xhudik

2,4421 gold badge24 silver badges40 bronze badges

Collectives™ on Stack Overflow

Why derivative of a function is used to calculate Local Minimum instead of the actual function?

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related