Last year we saw NeRF, NeRV, and other networks able to create 3D models and small scenes from images using artificial intelligence. Now, we are taking a small step and generating a bit more complex models: whole cities. Yes, you’ve heard that right, this week’s paper is about generating city-scale 3D scenes with high-quality details at any scale. It works from satellite view to ground-level with a single model. How amazing is that?! We went from one object that looked okay to a whole city in a year! What’s next!? I can’t even imagine.
The model is called CityNeRF and grows from NeRF, which I previously covered on my channel. NeRF is one of the first models using radiance fields and machine learning to construct 3D models out of images. But NeRF is not that efficient and works for a single scale. Here, CityNeRF is applied to satellite and ground-level images at the same time to produce various 3D model scales for any viewpoint. In simple words, they bring NeRF to city-scale. But how? Watch the video to learn more!
00:00
last year we first saw nerf then nerve
00:03
and other networks able to create 3d
00:05
models and small scenes from images
00:07
using artificial intelligence now we are
00:09
taking a small step and generating a bit
00:12
more complex models whole cities yes
00:15
you've heard that right this week's
00:16
paper is about generating city scale 3d
00:19
scenes with high quality details at any
00:21
scale it works from satellite view to
00:23
ground level with a single model how
00:26
amazing is that we went from one object
00:28
that looked ok to a whole city in a year
00:31
what's next i can't even imagine but i
00:33
can easily imagine what should be next
00:35
for you your next step as an ai
00:37
professional or student should be to do
00:39
like me and try the sponsor of today's
00:41
episode weights and biases if you run a
00:44
lot of experiments such as playing with
00:45
guns or any models like this one you
00:48
should be using weights and biases it
00:49
made my life so much easier you have no
00:52
idea and it takes not even five minutes
00:54
to set up simply install and import it
00:56
into your code add a line to initialize
00:58
and another to say which metric to track
01:01
and voira you will have all of your
01:03
future experiments in a project where
01:05
you can see all of the input hyper
01:07
parameters output matrix and any
01:09
insights that you and your team have and
01:11
easily compare all of them to find out
01:13
what worked best you can help out the
01:15
channel and give it a try with the first
01:17
link below it's completely free for
01:19
personal use and i promise it will be
01:21
set up in under 5 minutes the model is
01:23
called city nerf and grows from nerf
01:26
which i previously covered on my channel
01:28
nerf is one of the first models using
01:30
radeon's fields and machine learning to
01:32
construct 3d models out of images but
01:35
nerf is not that efficient and works for
01:37
a single scale here city nerf is applied
01:40
to satellite and ground level images at
01:42
the same time to produce various 3d
01:44
model scales for any viewpoint in simple
01:47
words they bring nerf to city scale but
01:50
how i won't be covering how nerf works
01:52
since i've already done this in a video
01:54
you can see in the top right corner of
01:56
your screen right now if you haven't
01:57
heard of this model yet instead i'll
01:59
mainly cover the differences and what
02:02
city nerf brings to the initial nerf
02:04
approach to make it multiscale here
02:06
instead of having different pictures a
02:08
few centimeters apart they have pictures
02:10
from thousands of kilometers apart
02:12
ranging from satellites to pictures
02:14
taken on the road as you can see north
02:16
alone fails to use such drastically
02:19
different pictures to reconstruct the
02:20
scenes in short using the weights of a
02:23
multi-layer perception a basic neural
02:25
network nerf will process all images
02:27
knowing their viewpoint and positions in
02:30
advance nerf will find each pixel's
02:32
colors and density using array from the
02:35
camera so it knows the camera's
02:37
orientations and can understand depth
02:39
and corresponding colors using all the
02:42
arrays together then this process is
02:44
optimized for the convergence of the
02:46
neural network using a loss function
02:48
that will get us closer to the ground
02:50
truth while training which is the real
02:52
3d model that we are aiming to achieve
02:55
as you can see here the problem is that
02:57
the quality of the rendered scene is
02:59
averaged at the most represented
03:01
distances and makes specific viewpoints
03:03
look blurry especially because we
03:05
typically have access to much more
03:07
satellite imagery than close views we
03:09
can try to fix this by training the
03:11
algorithm with different skills
03:12
independently but as they explain it
03:14
causes significant discrepancies between
03:16
successive scales so you will not be
03:19
able to zoom in and have a fluid nice
03:21
looking 3d scene at all times instead
03:24
they train their model in a progressive
03:26
manner meaning that they are training
03:27
their model in multiple steps
03:29
independently where each new step starts
03:31
from the learned parameters of the
03:33
previous step these steps are for
03:35
specific resolutions based on the camera
03:38
distance from the object of interest
03:40
here demonstrated with l so each step
03:42
will have its pre-processed pack of
03:45
images to be trained on and further
03:47
improved by the following steps starting
03:49
from far satellite images to more and
03:52
more zoomed in images the model can add
03:54
details and make a better foundation
03:56
over time as shown here they start by
03:59
training the model on l1 their farthest
04:01
view and end up with the ground level
04:03
images always adding to the network and
04:05
fine-tuning the model from the learn
04:07
parameters step to different scales so
04:09
this simple variable l controls the
04:12
level of detail and the rest of the
04:14
model stays the same for each stage
04:16
compared to having a pyramid-like
04:18
architecture for each scale as we
04:20
typically see the rest of the model is
04:22
basically an improved and adapt version
04:24
of nerf for this task you can learn more
04:26
about all the details of the
04:27
implementations in differences with nerf
04:30
in their great paper linked in the
04:31
description below and the code will be
04:33
available soon for you to try it if
04:35
interested and voila this is how they
04:38
enable nerf to be applied to city scale
04:40
scenes with amazing results it has
04:42
incredible industrial potential and i
04:44
hope to see more work in this field soon
04:47
thank you for watching and if you are
04:48
not subscribed please consider clicking
04:50
on the little red button it's free and
04:53
you will learn a lot i promise and i
04:55
will be sharing a couple of special
04:56
videos for the end of the year stay
04:58
tuned
05:00
[Music]
►Read the full article: https://www.louisbouchard.ai/citynerf/
►Xiangli, Y., Xu, L., Pan, X., Zhao, N., Rao, A., Theobalt, C., Dai, B.
and Lin, D., 2021. CityNeRF: Building NeRF at City Scale. https://arxiv.org/pdf/2112.05504.pdf
►Project link: https://city-super.github.io/citynerf/
►Code (coming soon): https://city-super.github.io/citynerf/
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/