Towards Understanding the Terminal Phase of Training of Deep Neural Networks中国科学院数学与系统科学研究院应用数学研究所

Towards Understanding the Terminal Phase of Training of Deep Neural Networks

Title:	Towards Understanding the Terminal Phase of Training of Deep Neural Networks
Speaker:	张驰浩博士，日本东京大学
Inviter:	张世华研究员
Time & Venue:	2021.10.28 8:00 N625
Abstract:	Modern practice for training classification deepnets involves a Terminal Phase of Training (TPT), which begins at the epoch where training error first vanishes; During TPT, the training error stays effectively zero while training loss is pushed towards zero. Vardan Papyan et al. characterizes the TPT as Neural Collapse (NC), involving four deeply interconnected phenomena: (NC1) Cross-example within-class variability of last-layer training activations collapses to zero, as the individual activations themselves collapse to their class-means; (NC2) The class-means collapse to the vertices of a Simplex Equiangular Tight Frame(ETF); (NC3) Up to rescaling, the last-layer classifiers collapse to the class-means, or in other words to the Simplex ETF, i.e. to a self-dual configuration; (NC4) For a given activation, the classifier's decision collapses to simply choosing whichever class has the closest train class-mean, i.e. the Nearest Class-Center (NCC) decision rule. However, the NC described by Vardan Papyan et al. focuses on the behaviors of the last layer of deepnets; the behaviors of the deepnets' intermediate layers is still unclear. In this talk, I will briefly introduce the NC phenomena and discuss the future direction towards understanding the TPT of deepnets by investigating the behaviors of the intermediate layers.
Affiliation: