Minimilization

From Overdensity
Jump to: navigation, search

Minimization is one of the existed principles of neural networks, there are a lot of other ideas which can be more simpler for understanding, because they related to humans or animals biological system.

All networks has mathematical background (exclude some heuristic algorithms). But for Gradient Descent you can get some simple intuition.

For me best way understand Gradient Descent is Random Hill Climbing algorithm Simple idea that you try randomly chose some point (your weights in term of ANN) and try verify your network, if your update is good, you can use that point (weights) at next iteration, if not - so you can rollback your changes and try again. The Gradient Descent have the same idea, but your new point (weights) selection have much stronger mathematical background.

Derivative for function is technique which can show you function slope and using this information you can figure out in which direction you need to go without random 'jumps'.

Imagine this situation: You are alone in mountains and you must find some place where you keep warm at night. You go in some direction and try check that became warmer. If so - you can try go in any other direction, but if not - you can go back in place where you was before and try different way. Maybe you can see some similarities between this situation and Random Hill Climbing algorithm.

Now try imaging the same situation, but in this case you have some extra information about mountains, you know that if you will find, that the lower you going - that the warmer it becomes. But in that situation there can be very thick fog and you can't check around yourself and find that place. But even without being able to see the area from possible point where you are to determine in which direction you are taking a step closer to the lowered sea level (The direction of the feet can indicate the correct direction).

This example make basic description of Gradient Descent algorithm intuition. There you can also see some problems in Gradient Descent. How far you need to go? Will it be the lowest (warmer) point?