Minimization

ADAM doesn’t generalize nearly as well as SGDM, and you can really see that when you start using multiple dissimilar data sets. If I’m using a dataset with sharp divergences, ADAM just does not do as well.