1,286 reads

ResNet: Block Level Design with Deep Learning Studio |PART 1|

by Manik SoniJanuary 8th, 2019

Too Long; Didn't Read

The main benefit of a very deep network is that it can represent very complex functions. It can also learn features at many different levels of abstraction, from edges (at the lower layers) to very complex features (at the deeper layers). However, using a deeper network doesn’t always help. A huge barrier to training them is vanishing gradients: very deep networks often have a gradient signal that goes to zero quickly, thus making gradient descent unbearably slow. More specifically, during gradient descent, as you backprop from the final layer back to the first layer, you are multiplying by the weight matrix on each step, and thus the gradient can decrease exponentially quickly to zero (or, in rare cases, grow exponentially quickly and “explode” to take very large values).

Company Mentioned

featured image - ResNet: Block Level Design with Deep Learning Studio |PART 1|

1 — The problem of very deep neural networks

Exploding And Vanishing Gradient Problem: Math Behind The Truth_Hello Stardust! Today we’ll see mathematical reason behind exploding and vanishing gradient problem but first let’s…_hackernoon.com

During training, you might therefore see the magnitude (or norm) of the gradient for the earlier layers decrease to zero very rapidly as training proceeds.

2 — Building a Residual Network

In ResNets, a “shortcut” or a “skip connection” allows the gradient to be directly backpropagated to earlier layers:

Two main types of blocks are used in a ResNet, depending mainly on whether the input/output dimensions are same or different. You are going to implement both of them in DLS.

We’ll implement resnet blocks similar to mentioned in their research paper. It isn’t exact.

2.1 — The identity block

The identity block is the standard block used in ResNets, and corresponds to the case where the input activation (say a[l]) has the same dimension as the output activation (say a[l+2]). To flesh out the different steps of what happens in a ResNet’s identity block, here is an alternative diagram showing the individual steps:

Identity block. Skip connection “skips over” 2 layers.

The upper path is the “shortcut path.” The lower path is the “main path.” In this diagram, we have also made explicit the CONV2D and ReLU steps in each layer. To speed up training we have also added a BatchNorm step. Batch Normalization must be done along the channel axis(mode=1).

To learn more about DLS:

Iris genus classification|DeepCognition| Azure ML studio_Kingdom:Plantae Clade:Angiosperms Order:Asparagales Family:Iridaceae Subfamily:Iridoideae Tribe:Irideae Genus:Iris_towardsdatascience.com

To implement the below, you may choose any dataset from DLS.

Identity Block ResNet

Here’re the individual steps.

First component of main path:

The first CONV2D has(F1,F1) filters of shape (1,1) and a stride of (1,1). Its padding is “valid”.
The first BatchNorm is normalizing the channels axis.
Then apply the ReLU activation function. This has no name and no hyperparameters.

Second component of main path:

The third CONV2D has (32,32)filters of shape (1,1) and a stride of (1,1). Its padding is “valid” .
The third BatchNorm is normalizing the channels axis.

Final step:

The shortcut and the input are added together.
Then apply the ReLU activation function. This has no name and no hyperparameters.

2.2 — The convolutional block

You’ve implemented the ResNet identity block. Next, the ResNet “convolutional block” is the other type of block. You can use this type of block when the input and output dimensions don’t match up. The difference with the identity block is that there is a CONV2D layer in the shortcut path:

Figure 4 : Convolutional block

The CONV2D layer in the shortcut path is used to resize the input x to a different dimension, so that the dimensions match up in the final addition needed to add the shortcut value back to the main path. For example, to reduce the activation dimensions’s height and width by a factor of 2, you can use a 1x1 convolution with a stride of 2. The CONV2D layer on the shortcut path does not use any non-linear activation function. Its main role is to just apply a (learned) linear function that reduces the dimension of the input, so that the dimensions match up for the later addition step.

The details of the convolutional block are as follows.

(You can choose values of f, F1, F2,F3 based on your dataset)

First component of main path:

The first CONV2D has (F1,F1) filters of shape (1,1) and a stride of (s,s).
The first BatchNorm is normalizing the channels axis(mode=1).
Then apply the ReLU activation function. This has no name and no hyperparameters.

Second component of main path:

The second CONV2D has (F2,F2)filters of (f,f) and a stride of (1,1).
The second BatchNorm is normalizing the channels axis.
Then apply the ReLU activation function. This has no name and no hyperparameters.

Third component of main path:

The third CONV2D has (F3,F3) filters of (1,1) and a stride of (1,1).
The third BatchNorm is normalizing the channels axis(mode=1).Note that there is no ReLU activation function in this component.

Shortcut path: