Reply to post • Studied this back in the day • The Register Forums

Tuesday 6th November 2018 14:35 GMT DCFusor

Studied this back in the day

And I find it interesting what the current day workers are reporting as problems, because these same problems (well, most of 'em) were identified with measures to avoid them "back in the day", IIRC, the '90s.

We used sigmoid type activation functions. There's a good reason they're better than relu. The squashing of the range prevents one neuron from locking up a network by being insanely "sure of itself". Yes, this also means you need more and better training data, and it takes more time to train. Being smart is hard, get over it. There's more computer power now, but not that much more (the hardness blows up faster than improvements in hardware has).

Since we were able to prove that no function required more then two hidden layers to map, we never used more than two hidden layers. Again, this means that it took more twiddling on the numbers of neurons in each layer, and again - more and better training data - and time.

There are other mistakes one should avoid, like trying to get a network to predict over more than one time period, or simply trying to do too much in a single network, as this makes it possible for the network to minimize its cost function by being dead wrong on some outputs if some of the others are right...there are a lot of things like this - you can't just blow a lot of data and cycles at something and "test in" whether it worked - there's no foolproof automated test for unexpected data. And lots of other mistakes you can make, but this is a reg comment. Suffice it to say, when you think you've reduced a problem to the point monkeys can do it...you get monkey solutions.

Now someone found that if you use a far easier to train (on your tiny already known univers) model is to use a stupid activation function and too many layers, you can train horribly oversized networks and sometimes get a fairly amazing result - but the truth is, and any real statistician knows this - you have ENORMOUSLY OVERFIT your tiny known universe.

Which is why you can easily fool the result into thinking a gun is a turtle, a stopsign is a hippo, whatever.

Bad networks are why GAN's are so easy. They'd be possible either way, but....

Topics

Special Features

Vendor Voice

Resources

User topics

Article topics

Reply to post: Studied this back in the day

Mything the point: The AI renaissance is simply expensive hardware and PR thrown at an old idea

Studied this back in the day

POST COMMENT House rules

Enter your comment

Add an icon

About Us

Our Websites

Your Privacy