A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Understanding the loss surface of a neural network is fundamentally important to the understanding of deep learning. This paper presents how piecewise linear activation functions substantially shape the loss surfaces of neural networks. We first prove that the loss surfaces of many neural networks have infinite spurious local minima which are defined as the local minima with higher empirical risks than the global minima. Our result demonstrates that the networks with piecewise lineararXiv:2003.12236v1 fatcat:r54rh2tczzbkhgbduhwcocl3zm