Deep Learning

From Overdensity
Jump to: navigation, search

Conner Davis, Data Scientist at Microsoft

Answered Jun 28, 2017 · Upvoted by Andrew Carr, Research Assistant at Deep Learning (2016-present) and Shashank Prasad, MS Computer Science & Machine Learning, Delft University of Technology (2018)

I wondered the same thing half an hour after learning what a neural network was. My background was an MS in pure math, so everything made perfect sense.

Specify a structure and a loss function to optimize. Optimize it using gradient descent. The network feeds forward with just matrix multiplication and point-wise activations. The network backpropagates using the multivariate chain rule. Update the weights accordingly

Profit Then I started applying it to difficult problems and quickly realized that understanding how deep learning works is very different from actually applying it successfully. There are tons of pitfalls to watch out for.

“But that's just something that takes some practice and experience,” I thought. “Once I have that, it's easy.”

Well, not so much. It turns out that no matter how much experience you have, there are still plenty of difficulties. Getting a deep network to do exactly what you want means

Optimizing structure Preventing over or under fitting. Getting it to converge (to a high-quality local minima) Making sure you have the right loss function Doing data augmentation correctly (not so easy sometimes) Proper pre-processing Etc… There is a great deal of work that goes into it and testing is very computationally expensive. It can take a week to test a single idea. When there are so many different variations to test, that's not practical. Even if it only takes a few minutes to test an idea, that time adds up very quickly.

But that's just the applied stuff!

The math is challenging too!

Yes, the basics of how it works is very simple, but the forefront of the field is anything but. As a general rule, the forefront of any field is going to be very difficult. If it weren't, we would have solved it already and we’d have a new forefront which is more difficult.

Like I said earlier, my background is an MS in math. Reading some of the recent papers on deep learning, I often encounter math I'm not familiar with or haven't used since that one differential geometry class I don't remember. Some papers I read recently made heavy use of

Hyperbolic geometry Functional analysis Dynamical systems Random matrix theory

Your average engineering undergrad can probably implement some of these ideas just by following the formulas in the paper, but doesn't have the mathematical background to develop them.

So yes, if you go deeper, you find that all areas of deep learning are challenging once you get beyond the very basics.

[1]