CDC officially recommends wearing face masks (even though not everyone complies). Meanwhile, governments in European countries like Spain, Ukraine, or certain regions in Italy require everyone, big or small, to wear masks all the time, when shopping, walking a dog, or plainly going outside. Breaking the requirements could result in a hefty fine.
Here at the development agency Fulcrum (partially because itās a quarantine and we had more free time than usual) we came up with this curious idea. We wanted to check if it was technically possible to recognize whether people, indeed, wear masks on the streets. For that, we decided to use online web cams located all over the world.
Let me make this clear from start: this is not a commercial project, but a curious experiment. Our goal was to check how viable this option is. Mass surveillance is not what we pursued, at any point.
So, in just a few weeks we created a neural network that could process images, video footage and recognize people wearing masks. Pretty accurately, I must say.
Technologies inside
When building our neural network, we used different open-source solutions, namely TensorFlow 2 Nightly, OpenCV 2, Keras, Yolov3. The project is also available on GitHub.Ā
Yolov3 is āthe brainsā behind the system. It included TensorFlow Nightly with built-in Keras. These technologies were used specifically for educating modules.
We used OpenCV for processing images and drawing āsquaresā on the photos/videos.
Our neural networks includes 2 different programs, written with different programming languages.Ā
Program 1.
Itās used for creating labels, composing datasets and annotations. The software is written in NodeJs. It comprises:
Itās used for creating labels, composing datasets and annotations. The software is written in NodeJs. It comprises:
⢠opencv4nodejs
⢠elementtree
⢠keras-js
Program 2.
This software is used for educating models. It is written in Python and includes:
This software is used for educating models. It is written in Python and includes:
⢠Modified latest yolov3
⢠Latest Python 3.6+
⢠opencv-python
⢠tensorflow 2.0.0-beta1 / nightly
Educating Modules
As we mentioned before, Yolov3 is ābrain mechanismā of our entire system. Itās an open-source project that we found on GitHub. The program requires the following parameters: anchors, labels, models, sizes, batch size, jitter, datasets.
- Anchors are the extent to which the needed elements can change their location, widen or narrow down.
- Labels are the exact same objects that we are looking for in the image. In our case, itās a mask.
- We use models for educating. At first itās crucial to use pre-defined yolov3. weights. But this model shouldnāt be educated later on. Itās used only for the structure and annotations.Ā
- Define min size, max size and net sizes of the images.
- Batch size ā the amount of images that are compared to each other.
- Jitter is the value used for cropping images (we typically use false or 0.3)
Datasets are the actual images and their descriptions.Ā
How to Generate Datasets
At this stage we need to locate images & process them. Initially, we parsed photos from Google using a simple software Picture Google Grabber.Ā
So, after you receive your collection of images, you have to create labels and annotations. Thatās why we used LabelBox. We applied this platform to identify the precise location of the masks. Labelbox is pretty useful, since it generates the file with the needed settings (file names, mask locations, time spent). Later on we use this data for one of our programs.
Yet, it has its downsides too, since the structure in Json is too customized. It also doesn't include image dimensions. Therefore we had to use opencv4nodejs for processing images.Ā
We also used Elementree for composing the structure of the XML tree & set the needed parameters. Afterward, we just created a loop, so that it would work for many images at the same time.
All the results were saved into Annotations folder. In the end, we received full datasets with needed annotations and beautiful structure. All these technologies are built into our first app (written in NodeJs).
Commands
Then we need to run our second app written in Python with all the needed annotations and datasets. It responds to the following commands:
āReadā
python src/pred.py -c configs/mask.json -imgs/1.jpg
This command helps to recognize the image.Ā
python src/pred.py -c configs/mask.json -imgs/1.jpg
This command helps to recognize the image.Ā
āTestā
python src/eval.py -c configs/ mask.json
This one shows the quality of the image, showing its 3 major dimensions (Fscore, Precision, Recall)
python src/eval.py -c configs/ mask.json
This one shows the quality of the image, showing its 3 major dimensions (Fscore, Precision, Recall)
āTrainā
python src/train_eager.py -c
This command is actually used for educating our neural network!
python src/train_eager.py -c
This command is actually used for educating our neural network!
āVideoā
python video.py -c configs/mask_500.json -i videoplayback.mp4
We use this command for video recognition.Ā
python video.py -c configs/mask_500.json -i videoplayback.mp4
We use this command for video recognition.Ā
How can this work with online web cams?
WebCam footage is usually stored as short videos, that generally last 5-10 minutes. These videos could be easily processed with the neural network like ours. Although, itād be hard to implement the network on the streets, this solution could be helpful at different factories that require people to wear masks when working.Ā
For more details, you can always check out our post on how we built a neural network and a dedicated whitepaper - where we described the major development process. Weād be happy to hear your feedback on this experiment āĀ let us know your thoughts!