딥러닝 복습 겸 듣고 있는 강의 과제로 제출한 custom gradient descent method.
Just a small idea…
깃헙에 간단히 SGD/Adagrad/Adam/Momentum/Nesterov/RMSProp/CrayonGrad을 비교 구현해놓았다.
Our object : deal wisely with unpredictable sizes of descending steps for all dimensions’ gradients to predict the most suitable direction to the (local) optima. Mostly, problem happens during the beginning phase (before the gradients get smaller as we get closer to the optima)
Our idea : assign roughly estimated directions during the initial phase, and then let the estimation procedure decay as learning goes on.
Just a small idea on getting better directions during the beginning phases.
- Use averaged gradient from all dimensions multiplied by the decaying rate to be added to each gradient.
- Let the estimation decay more and more.
Like using a crayon, our direction is rather roughly broad at first, and then it get’s sharpened out as we use it more and more.