In this article, I evaluate the many ways of weight initialization and current best practices.
Initializing weights to zero DOES NOT WORK. Then Why have I mentioned it here? To understand the need for weight initialization, we need to understand why initializing weights to zero WON’T work.
Let us consider a simple network like the one shown above. Each input is just one scaler X₁…