In this article, I evaluate the many ways of weight initialization and current best practices.

Zero Initialization

Initializing weights to zero DOES NOT WORK. Then Why have I mentioned it here? To understand the need for weight initialization, we need to understand why initializing weights to zero WON’T work.

Fig 1. Simple Network. Image by the Author.

Let us consider a simple network like the one shown above. Each input is just one scaler X₁, X₂, X₃. And the weights of each neuron are W₁ and W₂. Each weight update is as below:

Out₁ = X₁*W₁ + X₂*W₁ + X₃*W₁
Out₂ = X₁*W₂ + X₂*W₂ + X₃*W₂

As you can see by now, if the weight matrix W = [W₁ W₂] is initialized to zero…

In this article, I want to take an in-depth look at regularization.


What is regularization?

Regularization is a method to constraint the model to fit our data accurately and not overfit. It can also be thought of as penalizing unnecessary complexity in our model. There are mainly 3 types of regularization techniques deep learning practitioners use. They are:

  1. L1 Regularization or Lasso regularization
  2. L2 Regularization or Ridge regularization
  3. Dropout

Sidebar: Other techniques can also have a regularizing effect on our model. You can prevent overfitting by also having more data to constraint the search space of our function. This can be done with techniques like data augmentation, that create more data to…

An in-depth look at Cross-Entropy, the intuitions, and reasoning behind its necessity and utility.


For the longest time, I had not completely understood Cross-Entropy loss. Why did we take exponents (softmax)? Why did we then take the log? Why did we take negative of this log? How did we end up with a positive loss that we have to minimize?

These questions and more boggled my mind, to the point where I just accepted that I just had to use cross-entropy for multilabel classification and didn’t think about it much.

Recently I started going through fastai’s 2020 Course, where Jeremy was explaining Cross entropy, and even though I think he did a good job…

In this article, I will discuss what I think are the three most important architectures to be aware of for NLP.

Recurrent Neural Network

Recurrent Neural Network (RNN). Image from Wikipedia under CC BY-SA 4.0 License.

Recurrent neural networks are special architectures that take into account temporal information. The hidden state of an RNN at time t takes in information from both the input at time t and activations from hidden units at time t-1, to calculate outputs for time t. This can be seen in the image above. This gives the RNN memory, or the ability to remember previous inputs and their outputs.

This is extremely important for Natural language processing, as in NLP the input data does not have a fixed size, and the next word is highly dependent on previous words. Context is…

This is a beginner blog post for people who don’t know anything or know very little about the tax system.

I have recently started working in the USA, and I knew nothing about taxes. I did not know when to pay them, or how to pay them. My time for reckoning came when I came across a video on youtube about how I could save on taxes if I registered as an LLC. Although I am not that well versed, from my research, yet to help you figure out if and why you should register as an LLC, I will help you understand, as best as I have understood, the Tax Jargon for the USA.

In this article, I will…

In this article, I will discuss some of the most widely used DecisionTree-based algorithms for machine learning.

Decision Trees

What are they?

Decision trees are a tree algorithm that split the data based on certain decisions. Look at the image below of a very simple decision tree. We want to decide if an animal is a cat or a dog based on 2 questions.

  1. Are the ears pointy?
  2. Does the animal bark?

We can answer each question and depending on the answer, we can classify the animal as either a dog or a cat. The red lines represent the answer “NO” and the green line, “YES”.

This way the decision process can be laid out like a tree. The question nodes are…

In this article, I list my top 5 neural network architectures for computer vision in no particular order

Convolutional Neural Networks


The idea of convolutions was first introduced by Kunihiko Fukushima in this paper. The neocognitron introduced 2 types of layers, convolutional layers and downsampling layers.

Then next key advancement was by Yann LeCun et al. when they used back-propagation to learn the coefficients of the convolutional kernel from images. This made learning automatic and not laboriously handcrafted. According to Wikipedia, this approach became a foundation for modern computer vision.

Then came “ImageNet Classification with Deep Convolutional Neural Networks” in 2012, by Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, which is widely regarded as the most influential paper on convolutional neural…

This is part 1 of a multipart series: The things I love the most about my favourite deep learning library, fastai.

This episode: Learning rate (LR)

LR before fastai

The general consensus on finding the best LR was usually to train a model fully, until the desired metric was achieved, with different optimizers at different LRs. The optimal LR and optimizer are picked depending on what combination of them worked best in the picking phase. This is an ok technique, although computationally expensive.

Note: As I was introduced early in my deep learning career to fastai, I do not know a lot about how things are done without/before fastai, so please let me know if this was a bit inaccurate, also take this section with a grain of salt.

The fastai way

The easiest installation of windows I have found is with Anaconda. Anaconda is a Package manager that helps you install and maintain the correct versions of packages, and also allows you to make virtual environments.

Steps to install Fastai

  1. Install Anaconda
  2. Install Cudatoolkit

conda install -c anaconda cudatoolkit

3. Install pytorch, you can find instructions at Make sure to select the newest version of CUDA, and select CONDA as your package.

4. Install fastai with the git clone + pip install method.

git clone
pip install -e "fastai[dev]"

Note: As of now, num_workers has to be 0 for fastai. So whenever you’re making a dataloader, make sure to set num_workers =0.

Things to accomplish:
1. Find a picture you want to watermark. This will be referred to as background
2. Find a logo or text that will be your watermark, this will be referred to as watermark.
3. The objective is to paste a translucent watermark over the background.

from PIL import Image, ImageDraw, ImageFont

Here is my approach:
Step 1: Import PIL library

Step 2: Create your watermark


if your watermark is text, then you need to create an image with just the text with a transparent background. Here’s how thats done:
1. Create an ‘RGBA’ image with a transparent background. …

Akash Shastri

I love anything that makes me think. Check out my github here: Get in touch with me on LinkedIn at

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store