How to Build a Face Mask Detector With a Jetson Nano 2GB and AlwaysAI

It was exciting when Nvidia announced a new low price point for its dev kits, with the $59 Jetson Nano 2GB. I’m a big fan of the Jetson product line, and it is pretty amazing how much software Nvidia has running on them. However, having reviewed the 4GB version, the AGX version, and a DIY Jetbot, I didn’t want to just fire up the standard demos and write something up. I decided to tackle the task of building a DIY face mask detector and connect to a set of red, green, and blue, traffic lights.

The project, like so many, had a lot of twists and turns, as well as taking roughly 10x the effort I’d hoped. But in the end, it worked. For those curious about what’s possible with less than a $200 budget, here’s what I did.

Needed Hardware, Especially the Impressive Traffic Lights

For starters, you’ll need a Jetson Nano (or something similar that you can use the same way). I used the newest and least expensive model, but any other Jetson should also work. Then, of course, you need a camera. A USB camera, like a webcam, is perfect. Or a ribbon-connected camera like the Raspberry Pi camera. I’m pretty certain you could also use an IP-based camera stream, but I didn’t try it.

For the output, most of the DIY projects on the web for the Nano use some wimpy, tiny LEDs. That didn’t seem worthy of wfoojjaec, so I decided to go bigger. It turns out that used traffic lights — yes, the ones you see at intersections — are pretty easy to find on eBay. I bought a set of 12-inch lights in red, green, and yellow that had been retired from the Arizona Highway system — the shipping was the most expensive part.

Traffic lights, unfortunately for our purposes, are built to run on 120 volts. That’s because, even though they are LED, they are designed to be plug-compatible replacements for more traditional lamps. That meant either surgery on them to bypass the transformer, or switching 120v off and on. Since the traffic lights are carefully glued and weather-sealed, I opted for the latter. I already had two TP-Link/Kasa smart plugs, so one more was all I needed. As far as hardware, at this point you’re good to go, other than devising a mounting system for the lights (I’m still working on that), and perhaps a power strip.

Building a Face Mask Detecting Neural Network

As is often the case, the hardware was the easy part. My idea for a mask-detecting neural network was to start with a face-detecting object-detection model, which was trained to identify faces and their bounding boxes in a variety of scenes. Then I’d do additional training (aka transfer learning) to teach it to decide whether those faces were wearing masks. In principle, that shouldn’t be super-complicated, especially since Nvidia has a purpose-built Transfer Learning Toolkit (TLT) for just this purpose. However, it wound up being the most time-consuming part of the project.

Step 1: Creating Usable Datasets

The sad truth about glamorous-sounding AI projects is that the bulk of the effort often seems to be mundane data collection and formatting. In the case of masks, there are a number of datasets, all in different formats. I picked a medical mask dataset and a maskless face dataset (MAFA and FDDB) to use for training. Fortunately, there was some good Python code on the web to help me get them into the two formats I needed (KITTI for Nvidia’s transfer learning toolkit and Pascal VOC for AlwaysAI). The one annoying thing I ran into is that some of the bounding boxes in the datasets extended beyond the border of the image past where the Tensorflow re-training code from Nvidia was willing to deal with them. So that meant more Python code to process the data again and crop the bounding boxes that indicated the location of faces. I wound up with about 1,000 each of masked and maskless faces. For a more serious project, I definitely would have needed to add quite a bit more training data.

Step 2: Using Transfer Learning to Create a Mask/No-Mask Model

At the beginning of the project, I thought using Nvidia’s TLT would be smooth sailing. Unfortunately, once I dug into the details, I realized it wouldn’t be. It only runs on Linux and requires an Nvidia GPU for training. WSL 2 has started some limited support for GPUs, but in this case, it didn’t work. And my Paperspace cloud server that I often use for training doesn’t support the newer version of Nvidia drivers that TLT needs. So I dual-booted Ubuntu on my old laptop, thinking I’d finally found a solution. But the 750M GPU on it isn’t supported by the TLT. Finally, I looked at doing the re-training myself and then loading it into TLT for pruning and deployment, but that isn’t supported. It only works with certain Nvidia-provided base models.

Fortunately, I recently became acquainted with AlwaysAI, which offers a development and runtime framework for AI-enhanced vision projects. They were in the process of releasing a version of their product that supported additional model training and deployment on devices including a Nano. So I converted my data from KITTI to the required Pascal VOC format and gave it a try. I ran into a couple of small glitches, but their support was great and walked me through the right way to set up the needed Docker plus Python environment and use their command-line interface. Now I had a working model that I could see running.

Step 3: Controlling the Traffic Lights

It was easy to decide that a good way to control the lights would be with my Kasa smart plugs, especially since there was a slick little Python wrapper for it. Here, too, there was a snag, as the Kasa library required Python 3.7 while the AlwaysAI Docker image required 3.6. As anyone who uses Docker knows, it isn’t trivial to issue commands outside of the container. I tried to work around it by using ssh back to the same machine, but none of the workarounds for avoiding a password prompt on every command worked out for me. So I set up a small web server (trivial in Python) that would parse commands it received and forward them to the system. Then I used curl from my app to send the commands. That worked out great, although with a slight time delay. Helpfully, the AlwaysAI folks found me a workaround in the meantime, so I now had two solutions!

Here is the face mask detector in action:

Running the Mask Detector

I started with a base model that was trained to recognize faces at a distance, so I was hopeful that my mask detector would also work at a distance — for example, people walking up to our house. However, perhaps because many of my additional training images were closeups, or maybe because I’m using an old webcam, it didn’t detect much until a person was within a few feet. Enough range for a doorbell camera, maybe, but not for a large room. So far, I haven’t bothered adding fancy timers and multiple states to my code, so it simply shows yellow until it detects an un-masked face — when it shows red — or a masked face — when it shows green. AlwaysAI made the core code pretty easy. Below is the entire frame processing loop. The doit() function is one that I wrote that issues the curl commands to the mini web server:

Even on the $59 Nano, the model was able to analyze a frame in less than 1/10th of a second, giving it an overall frame rate of around 12fps. Plenty for this application. Overall, aside from the un-planned “learning experiences” of having to try multiple approaches for each step, it shows what’s possible with minimal hardware and off-the-shelf free software (AlwaysAI is currently free, although in the future they’ll be charging for commercial projects, and all the other code I used is free for personal use).