Feb 28

Really, just one

4 Comments

One technical correction: the Attention paper didn't make everything sublinear; it just made it much more parralelizable and practical.

It got rid of the recursion which was the last thing which could cause gradient explosion but fair point, it was sort of groped to over time rather than happening in one fell swoop, but that’s the thing which finished the job.

Also training humans is slower, assuming you have the training data

Reply

Share

California Pirate Party

Feb 28

After twenty years I can finally follow what you say without looking things up.

Reply

Share

Bram’s Thoughts

There's Only One Idea In AI