paint-brush
Convolutional Neural Networks (CNNs) Explainedby@whatsai
478 reads
478 reads

Convolutional Neural Networks (CNNs) Explained

by Louis BouchardFebruary 3rd, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

What is a convolutional neural network and explanation of one of the best and most used state-of-the-art CNN architecture in 2020: DenseNet.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Convolutional Neural Networks (CNNs) Explained
Louis Bouchard HackerNoon profile picture

In this video, I explain what convolutions and convolutional neural networks are, and introduce, in detail, one of the best and most used state-of-the-art CNN architectures in 2020: DenseNet.

Watch the Video Below

If you would like me to cover any other neural network architecture or research paper, please let me know in the comments!

References

​Follow me for more AI content

Chapters

0:00​ - Hey! Tap the Thumbs Up button and Subscribe. You'll learn a lot of cool stuff, I promise.

0:18 -​ The Convolutional Neural Networks

0:39 -​ A … convolution?

2:07​ - Training a CNN

2:45​ - The activation function: ReLU

3:20​ - The pooling layers: Max-Pooling

4:40​ - The state-of-the-art CNNs: A quick history

5:23​ - The most promising CNN architecture: DenseNet

8:39​ - Conclusion

Video Transcript

facial recognition targeted ads image

00:03

recognition

00:04

video analysis animali detection these

00:07

are all powerful ai applications

00:09

you must already have heard of at least

00:12

once

00:12

but do you know what they all have in

00:14

common they are all using the same type

00:16

of neural network architecture

00:18

the convolutional neural network they

00:21

are the most used type of neural

00:22

networks

00:23

and the best for any computer vision

00:25

applications

00:26

once you understand these you are ready

00:28

to dive into the field and become an

00:30

expert

00:31

the convolutional neural networks are a

00:33

family of deep neural networks that uses

00:36

mainly convolutions to achieve the task

00:38

expected as the name says convolution is

00:41

the process

00:42

where the original image which is our

00:45

input in a computer vision application

00:47

is convolved using filters that detects

00:50

important small features of an image

00:52

such as edges the network will

00:55

autonomously learn filter's value that

00:57

detect

00:57

important features to match the output

00:59

we want to have

01:01

such as the name of the object in a

01:02

specific image

01:04

sent as input these filters are

01:06

basically squares of size

01:08

3x3 or 5x5 so they can detect the

01:12

direction

01:12

of the edge left right up or down

01:16

just like you can see in this image the

01:18

process of convolution makes a dot

01:20

product between the filter and the

01:22

pixels it faces

01:24

then it goes to the right and does it

01:26

again convolving the whole

01:28

image once it's done these give us the

01:31

output of the first convolution layer

01:33

which is called

01:34

a feature map then we do the same thing

01:37

with another filter

01:38

giving us many filter maps at the end

01:42

which are all sent into the next layer

01:44

as input to produce

01:45

again many other feature maps until it

01:48

reaches the end of the network with

01:50

extremely detailed general information

01:53

about what the image contains there are

01:56

many filters and the numbers inside

01:58

these filters are called the weights

02:00

which are the parameters trained during

02:02

our training phase

02:04

of course the network is not only

02:05

composed of convolutions

02:08

in order to learn we also need to add an

02:10

activation function

02:11

and a pooling layer between each

02:13

convolution layer

02:15

basically these activation functions

02:17

make possible the use of the back

02:19

propagation technique

02:21

which basically calculates the error

02:23

between our guess

02:24

and the real answer we were supposed to

02:26

have

02:27

then propagating this error throughout

02:29

the network

02:30

changing the weights of the filters

02:32

based on this error

02:34

once the propagated error reaches the

02:36

first layer another example is fed to

02:38

the network

02:39

and the whole learning process is

02:41

repeated thus iteratively improving our

02:44

algorithm

02:45

this activation function is responsible

02:48

for determining

02:49

the output of each convolution

02:51

computation and reducing the complexity

02:53

of our network

02:55

the most popular activation function is

02:57

called the real u

02:58

function which stands for rectified

03:00

linear

03:01

unit it puts to zero any negative

03:04

results which are known to be harmful to

03:06

the network

03:07

and keeps positive values the same

03:10

having all these zeros make the network

03:12

much more efficient to train in

03:14

computation time

03:16

since a multiplication with zero will

03:18

always equal

03:19

zero then again to simplify our network

03:22

and reduce the numbers of parameters

03:24

we have the pooling layers typically

03:27

we use a two by two pixels window and

03:30

take the maximum value of this window to

03:32

make the first pixel of our feature map

03:35

this is known as max pooling then we

03:38

repeat this process for the whole

03:39

feature map

03:40

which will reduce the x y dimensions of

03:43

the feature map

03:44

thus reducing the number of parameters

03:46

in the network the deeper we get into it

03:48

this is all done while keeping the most

03:51

important information

03:53

these three layers convolution

03:55

activation and pooling layers can be

03:57

repeated multiple times in a network

03:59

which we call our conf layers making the

04:02

network

04:03

deeper and deeper finally there are the

04:06

fully connected layers that learn a

04:08

non-linear function

04:09

from the last pooling layer's outputs it

04:12

flattens the multi-dimensional

04:14

volume that is resulted from the pooling

04:16

layers into a one-dimensional vector

04:18

with the same amount of total parameters

04:21

then we use this vector in a small fully

04:24

connected neural network

04:25

with one or more layers for image

04:28

classification

04:29

or other purposes resulting in one

04:31

output per image

04:33

such as the class of the object of

04:36

course

04:36

this is the most basic form of

04:38

convolutional neural networks

04:40

there have been many different

04:42

convolutional architectures

04:44

since lenet5 by jan lacun in 1998

04:47

and more recently with the first deep

04:49

learning network

04:50

applied in the most popular object

04:52

recognition competition

04:54

with the progress of the gpus the alex

04:57

net network in 2012

04:59

this competition is the imagenet

05:01

large-scale visual recognition

05:03

competition

05:04

rls vrc where the best object detection

05:07

algorithms were competing every year

05:10

on the biggest computer vision data set

05:12

ever created

05:13

imagenet it exploded right after this

05:16

year

05:17

where new architectures were beating the

05:19

precedent one

05:20

and always performing better until today

05:23

nowadays most state-of-the-art

05:25

architectures perform

05:26

similarly and have some specific use

05:29

cases

05:29

where they are better you can see here a

05:32

quick comparison of the most used

05:34

architectures in 2020

05:37

this is why i will only cover my

05:40

favorite network in this video which is

05:42

the one that yields the best results in

05:44

my researches

05:45

densenet it is also the most interesting

05:48

and promising cnn architecture in my

05:50

opinion please let me know in the

05:53

comments if you would like me to cover

05:55

any other type of network architecture

05:58

the densenet family first appeared in

06:00

2016

06:01

in the paper called densely connected

06:03

convolutional

06:04

networks by facebook ai research

06:07

it is a family because it has many

06:10

versions

06:11

with different depth ranging from 121

06:14

layers

06:15

with 0.8 million parameters

06:18

up to a version with 264

06:22

layers with 15.3 million parameters

06:26

which is smaller than the 101 layers

06:28

deep

06:29

resnet architecture as you can see here

06:32

the densnet architecture uses the same

06:34

concepts of convolutions

06:35

pooling and the relu activation function

06:38

to work

06:39

the important detail and innovation in

06:41

this network architecture

06:42

are the dense blocks here is an example

06:45

of a five-layer dense block in these

06:48

dense blocks

06:49

each layer takes all the preceding

06:51

feature maps as input

06:53

thus helping the training process by

06:56

alleviating the vanishing gradient

06:58

problem

06:59

this vanishing gradient problem appears

07:01

in really deep

07:02

networks where they are so deep that

07:04

when we back propagate the error into

07:06

the network

07:07

this error is reduced at every step and

07:10

eventually becomes

07:11

zero these connections basically allow

07:14

the error to be propagated

07:16

further without being reduced too much

07:19

these connections also encourage feature

07:21

reuse and reduce the numbers of

07:23

parameters

07:24

for the same reason since it's reusing

07:27

previous feature maps information

07:29

instead of generating more parameters

07:31

and therefore

07:32

accessing the network's collective

07:34

knowledge and reducing the chance of

07:36

overfitting

07:37

due to this reduction in total

07:39

parameters

07:40

and as i said this works extremely well

07:43

reducing the number of parameters by

07:45

around 5 times compared to a

07:46

state-of-the-art resnet architecture

07:48

with the same number of layers

07:50

the original dense net family is

07:52

composed of four dense blocks

07:55

with transition layers which do

07:57

convolution

07:58

and pooling as well and a final

08:00

classification layer if we are working

08:02

on an image classification task

08:04

such as the rls vrc competition

08:08

the size of the dense block is the only

08:10

thing changing for

08:12

each version of the densenet family to

08:14

make the network

08:15

deeper of course this was just an

08:18

introduction to the convolutional

08:19

neural networks and more precisely the

08:22

dense net architecture

08:23

i strongly invite you to further read

08:25

about these architectures if you want to

08:27

make a well thought choice for your

08:29

application

08:30

the paper and github links for densenet

08:32

are in the description of the video

08:34

please let me know if you would like me

08:36

to cover any other architecture

08:39

please leave a like if you went this far

08:41

in the video

08:42

and since there are over 90 of you guys

08:44

watching that are not subscribed yet

08:46

consider subscribing to the channel to

08:48

not miss any further news clearly

08:50

explained

08:51

thank you for watching

08:55

[Music]