Abstract: | With big data and big computation, deep learning has achieved breakthrough in computer vision, natural language computing, speech, etc. At the same time, researchers are thinking about how to alleviate hyper-parameter tuning efforts, understanding why DNN can generalize well so far, and investigating how to make deep learning do better in out-of-distribution prediction. In this talk, I will introduce our recent research about the optimization, generalization, and o.o.d. prediction in deep learning. Firstly, I will present a new group-invariant optimization framework for ReLU neural networks, in which the positive-scaling redundancy can be removed; then, I will present our work about the implicit bias of the widely-used stochastic optimization algorithms in deep learning; finally, I will talk about how to improve out-of-distribution prediction by incorporating "causal" invariance. |