ADAM doesn’t generalize nearly as well as SGDM, and you can really see that when you start using multiple dissimilar data sets. If I’m using a dataset with sharp divergences, ADAM just does not do as well.
For dragon enthusiasts
ADAM doesn’t generalize nearly as well as SGDM, and you can really see that when you start using multiple dissimilar data sets. If I’m using a dataset with sharp divergences, ADAM just does not do as well.