The main benefit of a very deep network is that it can represent very complex functions. It can also learn features at many different levels of abstraction, from edges (at the lower layers) to very complex features (at the deeper layers). However, using a deeper network doesn’t always help. A huge barrier to training them is vanishing gradients: very deep networks often have a gradient signal that goes to zero quickly, thus making gradient descent unbearably slow. More specifically, during gradient descent, as you backprop from the final layer back to the first layer, you are multiplying by the weight matrix on each step, and thus the gradient can decrease exponentially quickly to zero (or, in rare cases, grow exponentially quickly and “explode” to take very large values).
Exploding And Vanishing Gradient Problem: Math Behind The Truth_Hello Stardust! Today we’ll see mathematical reason behind exploding and vanishing gradient problem but first let’s…_hackernoon.com
During training, you might therefore see the magnitude (or norm) of the gradient for the earlier layers decrease to zero very rapidly as training proceeds.
In ResNets, a “shortcut” or a “skip connection” allows the gradient to be directly backpropagated to earlier layers:
Two main types of blocks are used in a ResNet, depending mainly on whether the input/output dimensions are same or different. You are going to implement both of them in DLS.
We’ll implement resnet blocks similar to mentioned in their research paper. It isn’t exact.
The identity block is the standard block used in ResNets, and corresponds to the case where the input activation (say a[l]) has the same dimension as the output activation (say a[l+2]). To flesh out the different steps of what happens in a ResNet’s identity block, here is an alternative diagram showing the individual steps:
Identity block. Skip connection “skips over” 2 layers.
The upper path is the “shortcut path.” The lower path is the “main path.” In this diagram, we have also made explicit the CONV2D and ReLU steps in each layer. To speed up training we have also added a BatchNorm step. Batch Normalization must be done along the channel axis(mode=1).
To learn more about DLS:
Iris genus classification|DeepCognition| Azure ML studio_Kingdom:Plantae Clade:Angiosperms Order:Asparagales Family:Iridaceae Subfamily:Iridoideae Tribe:Irideae Genus:Iris_towardsdatascience.com
To implement the below, you may choose any dataset from DLS.
Identity Block ResNet
Here’re the individual steps.
First component of main path:
Second component of main path:
Final step:
You’ve implemented the ResNet identity block. Next, the ResNet “convolutional block” is the other type of block. You can use this type of block when the input and output dimensions don’t match up. The difference with the identity block is that there is a CONV2D layer in the shortcut path:
Figure 4 : Convolutional block
The CONV2D layer in the shortcut path is used to resize the input x to a different dimension, so that the dimensions match up in the final addition needed to add the shortcut value back to the main path. For example, to reduce the activation dimensions’s height and width by a factor of 2, you can use a 1x1 convolution with a stride of 2. The CONV2D layer on the shortcut path does not use any non-linear activation function. Its main role is to just apply a (learned) linear function that reduces the dimension of the input, so that the dimensions match up for the later addition step.
The details of the convolutional block are as follows.
(You can choose values of f, F1, F2,F3 based on your dataset)
First component of main path:
Second component of main path:
Third component of main path:
Shortcut path:
Final step:
If you like this article, do 👏 and share😄.For more articles on Deep Learning follow me on Medium and LinkedIn.
Do subscribe to my YouTube channel:
AI with MANIK_AI is back with a bang!_www.youtube.com
Thanks for reading 😃
Happy ResNets.