In the arena of deep learning, double descent or grokking seems to insinuate a moment of sudden realization, a glimpse of wisdom sublimed from tiredless learning, and a hope of enlightenment at last.

The learning dynamics of artificial neural networks (ANN) is necessarily distinct from that of human learning. Just think about it and you will appreiciate the difference.

However, the workings of ANNs may more resemble that of our human brain, particulary at the physiological level.

1) In the early training phase, ANNs simply absorbs information via rote memorization, just like a newborn doesn’t really “know” anything but can memorize.

2) As training goes on, sole reliance on memorization amounts to a crisis for ANNs, which are approaching its memory capacity and constantly battling confusion and conflicts caused by entanglements and correlations. This perhaps resembles the chaos in human brains at the time of puberty to maturity, too many things going on and unable to make sense of anything.

This also marks the end of the first descent phase, which is exemplified by superior learning/memorization ability and excellent performances in tasks demaining shear computing power or memorization.

3) The second descent phase is signified by sprinkles of eureka moments, knowledge distillation for comprehension (and release of brain capacity), and enhanced reasoning ability. All these attributes are desired for model generalization on unseen samples. However, it may be accompanied by loss of memorization ability, degradation of learning capacity, and worse performances on seen samples.

Nonetheless, given a finite training set, this resolves the confusion and conflicts arising at the end of the first descent phase, leading to the second descent phase. Isn’t this how human gains wisdom?

Overall, we attempt to make an analogy between the training process of ANNs and the aging process of human brains.