In this video, I explain what convolutions and convolutional neural networks are, and introduce, in detail, one of the best and most used state-of-the-art CNN architectures in 2020: DenseNet.
If you would like me to cover any other neural network architecture or research paper, please let me know in the comments!
0:00 - Hey! Tap the Thumbs Up button and Subscribe. You'll learn a lot of cool stuff, I promise.
0:18 - The Convolutional Neural Networks
0:39 - A … convolution?
2:07 - Training a CNN
2:45 - The activation function: ReLU
3:20 - The pooling layers: Max-Pooling
4:40 - The state-of-the-art CNNs: A quick history
5:23 - The most promising CNN architecture: DenseNet
8:39 - Conclusion
facial recognition targeted ads image
00:03
recognition
00:04
video analysis animali detection these
00:07
are all powerful ai applications
00:09
you must already have heard of at least
00:12
once
00:12
but do you know what they all have in
00:14
common they are all using the same type
00:16
of neural network architecture
00:18
the convolutional neural network they
00:21
are the most used type of neural
00:22
networks
00:23
and the best for any computer vision
00:25
applications
00:26
once you understand these you are ready
00:28
to dive into the field and become an
00:30
expert
00:31
the convolutional neural networks are a
00:33
family of deep neural networks that uses
00:36
mainly convolutions to achieve the task
00:38
expected as the name says convolution is
00:41
the process
00:42
where the original image which is our
00:45
input in a computer vision application
00:47
is convolved using filters that detects
00:50
important small features of an image
00:52
such as edges the network will
00:55
autonomously learn filter's value that
00:57
detect
00:57
important features to match the output
00:59
we want to have
01:01
such as the name of the object in a
01:02
specific image
01:04
sent as input these filters are
01:06
basically squares of size
01:08
3x3 or 5x5 so they can detect the
01:12
direction
01:12
of the edge left right up or down
01:16
just like you can see in this image the
01:18
process of convolution makes a dot
01:20
product between the filter and the
01:22
pixels it faces
01:24
then it goes to the right and does it
01:26
again convolving the whole
01:28
image once it's done these give us the
01:31
output of the first convolution layer
01:33
which is called
01:34
a feature map then we do the same thing
01:37
with another filter
01:38
giving us many filter maps at the end
01:42
which are all sent into the next layer
01:44
as input to produce
01:45
again many other feature maps until it
01:48
reaches the end of the network with
01:50
extremely detailed general information
01:53
about what the image contains there are
01:56
many filters and the numbers inside
01:58
these filters are called the weights
02:00
which are the parameters trained during
02:02
our training phase
02:04
of course the network is not only
02:05
composed of convolutions
02:08
in order to learn we also need to add an
02:10
activation function
02:11
and a pooling layer between each
02:13
convolution layer
02:15
basically these activation functions
02:17
make possible the use of the back
02:19
propagation technique
02:21
which basically calculates the error
02:23
between our guess
02:24
and the real answer we were supposed to
02:26
have
02:27
then propagating this error throughout
02:29
the network
02:30
changing the weights of the filters
02:32
based on this error
02:34
once the propagated error reaches the
02:36
first layer another example is fed to
02:38
the network
02:39
and the whole learning process is
02:41
repeated thus iteratively improving our
02:44
algorithm
02:45
this activation function is responsible
02:48
for determining
02:49
the output of each convolution
02:51
computation and reducing the complexity
02:53
of our network
02:55
the most popular activation function is
02:57
called the real u
02:58
function which stands for rectified
03:00
linear
03:01
unit it puts to zero any negative
03:04
results which are known to be harmful to
03:06
the network
03:07
and keeps positive values the same
03:10
having all these zeros make the network
03:12
much more efficient to train in
03:14
computation time
03:16
since a multiplication with zero will
03:18
always equal
03:19
zero then again to simplify our network
03:22
and reduce the numbers of parameters
03:24
we have the pooling layers typically
03:27
we use a two by two pixels window and
03:30
take the maximum value of this window to
03:32
make the first pixel of our feature map
03:35
this is known as max pooling then we
03:38
repeat this process for the whole
03:39
feature map
03:40
which will reduce the x y dimensions of
03:43
the feature map
03:44
thus reducing the number of parameters
03:46
in the network the deeper we get into it
03:48
this is all done while keeping the most
03:51
important information
03:53
these three layers convolution
03:55
activation and pooling layers can be
03:57
repeated multiple times in a network
03:59
which we call our conf layers making the
04:02
network
04:03
deeper and deeper finally there are the
04:06
fully connected layers that learn a
04:08
non-linear function
04:09
from the last pooling layer's outputs it
04:12
flattens the multi-dimensional
04:14
volume that is resulted from the pooling
04:16
layers into a one-dimensional vector
04:18
with the same amount of total parameters
04:21
then we use this vector in a small fully
04:24
connected neural network
04:25
with one or more layers for image
04:28
classification
04:29
or other purposes resulting in one
04:31
output per image
04:33
such as the class of the object of
04:36
course
04:36
this is the most basic form of
04:38
convolutional neural networks
04:40
there have been many different
04:42
convolutional architectures
04:44
since lenet5 by jan lacun in 1998
04:47
and more recently with the first deep
04:49
learning network
04:50
applied in the most popular object
04:52
recognition competition
04:54
with the progress of the gpus the alex
04:57
net network in 2012
04:59
this competition is the imagenet
05:01
large-scale visual recognition
05:03
competition
05:04
rls vrc where the best object detection
05:07
algorithms were competing every year
05:10
on the biggest computer vision data set
05:12
ever created
05:13
imagenet it exploded right after this
05:16
year
05:17
where new architectures were beating the
05:19
precedent one
05:20
and always performing better until today
05:23
nowadays most state-of-the-art
05:25
architectures perform
05:26
similarly and have some specific use
05:29
cases
05:29
where they are better you can see here a
05:32
quick comparison of the most used
05:34
architectures in 2020
05:37
this is why i will only cover my
05:40
favorite network in this video which is
05:42
the one that yields the best results in
05:44
my researches
05:45
densenet it is also the most interesting
05:48
and promising cnn architecture in my
05:50
opinion please let me know in the
05:53
comments if you would like me to cover
05:55
any other type of network architecture
05:58
the densenet family first appeared in
06:00
2016
06:01
in the paper called densely connected
06:03
convolutional
06:04
networks by facebook ai research
06:07
it is a family because it has many
06:10
versions
06:11
with different depth ranging from 121
06:14
layers
06:15
with 0.8 million parameters
06:18
up to a version with 264
06:22
layers with 15.3 million parameters
06:26
which is smaller than the 101 layers
06:28
deep
06:29
resnet architecture as you can see here
06:32
the densnet architecture uses the same
06:34
concepts of convolutions
06:35
pooling and the relu activation function
06:38
to work
06:39
the important detail and innovation in
06:41
this network architecture
06:42
are the dense blocks here is an example
06:45
of a five-layer dense block in these
06:48
dense blocks
06:49
each layer takes all the preceding
06:51
feature maps as input
06:53
thus helping the training process by
06:56
alleviating the vanishing gradient
06:58
problem
06:59
this vanishing gradient problem appears
07:01
in really deep
07:02
networks where they are so deep that
07:04
when we back propagate the error into
07:06
the network
07:07
this error is reduced at every step and
07:10
eventually becomes
07:11
zero these connections basically allow
07:14
the error to be propagated
07:16
further without being reduced too much
07:19
these connections also encourage feature
07:21
reuse and reduce the numbers of
07:23
parameters
07:24
for the same reason since it's reusing
07:27
previous feature maps information
07:29
instead of generating more parameters
07:31
and therefore
07:32
accessing the network's collective
07:34
knowledge and reducing the chance of
07:36
overfitting
07:37
due to this reduction in total
07:39
parameters
07:40
and as i said this works extremely well
07:43
reducing the number of parameters by
07:45
around 5 times compared to a
07:46
state-of-the-art resnet architecture
07:48
with the same number of layers
07:50
the original dense net family is
07:52
composed of four dense blocks
07:55
with transition layers which do
07:57
convolution
07:58
and pooling as well and a final
08:00
classification layer if we are working
08:02
on an image classification task
08:04
such as the rls vrc competition
08:08
the size of the dense block is the only
08:10
thing changing for
08:12
each version of the densenet family to
08:14
make the network
08:15
deeper of course this was just an
08:18
introduction to the convolutional
08:19
neural networks and more precisely the
08:22
dense net architecture
08:23
i strongly invite you to further read
08:25
about these architectures if you want to
08:27
make a well thought choice for your
08:29
application
08:30
the paper and github links for densenet
08:32
are in the description of the video
08:34
please let me know if you would like me
08:36
to cover any other architecture
08:39
please leave a like if you went this far
08:41
in the video
08:42
and since there are over 90 of you guys
08:44
watching that are not subscribed yet
08:46
consider subscribing to the channel to
08:48
not miss any further news clearly
08:50
explained
08:51
thank you for watching
08:55
[Music]