Artificial intelligence (AI) is the field of making computers able to act intelligently, to make decisions in real environments that will have favorable outcomes.
This is obviously a broad, and somewhat vague, definition, and there are many fields within this umbrella term. One example of such a field is that of computer vision, in which computers can process images as a human would, and make inferences about what is in an image so that computer programs can then use that information to make decisions that have favorable outcomes.
It is becoming more common to see artificial intelligence applications such as computer vision integrated into new business models and products. Computer vision has many real-world applications, analyzing traffic patterns, detecting changes in posture, counting the number of persons in an area, etc.
Learning how to build any computer vision application requires a steep learning curve, and deploying it to the edge adds an extra layer of complication.
While deploying and running applications in the cloud has become more popular in recent years, this isn’t always the best solution, especially for applications that require heavy processing or that use sensitive data, both of which are often the case with computer vision applications.
In such circumstances, the best option is often to deploy your application to the edge. Deploying computer vision applications on the edge has numerous benefits, which we will go through in this article. We will also describe the challenges of running computer vision on the edge and how to overcome them.
In general, edge devices are small, lightweight, devices on which a computer vision application can be deployed and run. Many edge devices today even have a Graphical Processing Unit (GPU), or Visual Processing Unit (VPU), which enable usage of a greater range of models and application complexity.
In the context of this article, an edge device is a device such as a Raspberry Pi, NVIDIA Jetson devices like the Jetson Nano or NVIDIA Jetson Xavier NX, or various Internet of Things (Iot) devices, which are edge devices that have some capability to sense or assess, and possibly interact with, the environment in which they are used.
The three main benefits of running a computer vision application on the edge are time, security, and cost.
While an internet connection can be used to deploy the application, once the application is on the edge device, there is no need for it to have cloud connectivity to function. This means that any inferencing the application does is on the edge device itself, not in the cloud, and this drastically reduces the time it takes for those inferences to be turned into actions by the application. For certain use cases, such as self-driving vehicles or security cameras, this is imperative. Aside from the risk that data may be lost being sent to and from the cloud, the additional time required for using a cloud approach can mean not responding to a task in time, which can be catastrophic for tasks like autonomous driving.
There are also specialized peripheral devices, in particular cameras, that don’t have internet connectivity themselves and are used by edge devices to improve performance or expand application functionality. Such devices take the concept of processing on the edge even further. Instead of a USB or ribbon camera transmitting the image data to an edge device for processing, these devices incorporate a processor into the camera itself, further reducing processing latency.
Secondly, because the data doesn’t need to travel to the cloud with edge deployment, all data can stay in a closed circuit on the device itself. Edge computing distributes the risk of exposure across multiple devices, and can perform all processing disconnected form a central server - a more secure and private architecture. For example, edge devices can be installed inside of a person’s home, process real-time data and act on that data without relying on a shared cloud service that could compromise the privacy of their day-to-day activity. Because computer vision data may involve people or other sensitive data, this is much more secure.
Finally, without the need for cloud processing, deploying a CV application on the edge is much less expensive, as cloud companies charge for inferencing per endpoint per minute. This is a great, flexible option for those who want to pay on an ‘as needed’ basis, but becomes burdensome for organizations that demand large amounts of real-time processing - such as a smart city, hospital or warehouse with many cameras and sensors running 24 hours a day.
Building a computer vision application entails its own challenges, and is often an iterative process, and therefore any pipeline you use to build computer vision applications must be flexible enough to handle changes such as the model or libraries used, or device that the application is deployed on.
Let’s walk through the challenges of deploying computer vision applications on the edge with a story. First, you need to get a computer vision model, either by 1) scouring the web for an open-source model that fits your needs, or 2) by building an image dataset and training your own model. The second option requires collecting your own dataset, which is labor-intensive, as well as an understanding of several tools for image annotation and the training itself.
Next, you need to build your application. Let’s say you have a working model and you are prototyping your computer vision application. One popular computer vision library is OpenCV.
Developers can easily use the C++ or Python Application Programming Interfaces (APIs) of OpenCV. For example, developers can access the Python OpenCV API simply by using
pip install opencv-python
. You spend some time learning this API and have a working prototype that integrates well with the output of your model. Now that you’re happy with the app, you’d like to deploy it to an edge device, like a Raspberry Pi. The first thing you’ll find is that the OpenCV build you were using isn’t compatible with the Raspberry Pi processor. Other dependencies may also have the same problem. This leads you to compile OpenCV on the Raspberry Pi.
Next, you need to figure out how to see the results! On your laptop, this was easy with cv2.imshow(), but if you’re working on a headless device (no graphical interface), you’ll need to find another way.
Now that your app is up and running, you see that the amazingly fast performance you saw on your laptop is long gone. Of course, nobody expects a $35 single-board computer to outperform your Macbook, but you want better performance.
You’ve read that you can use Intel’s Neural Compute Stick 2 to get better performance out of a Pi, but now you need to build OpenVINO as well. Since OpenVINO’s interface is different from OpenCV, you’ll need to decide whether to rewrite your app to take advantage of the NCS2, or rebuild OpenCV so it can run the OpenVINO inference engine. When it’s all said and done, you may find that your model requires more tweaking, possibly the output changes, and you may be looping through the steps again!
Overcoming these challenges requires IoT experience in addition to computer vision knowledge. However, a developer platform such as alwaysAI can also help solve these problems. At alwaysAI, we have the singular mission of making the process of building and deploying computer vision apps to edge devices as easy as possible. That includes training your model, building your app, and deploying your app to edge devices such as the Raspberry Pi, Jetson Nano, and many others.
Once you have installed alwaysAI, you only need to know a few basic commands, our command line interface (CLI) to collect data, annotate it, train a model, and deploy and start your applications.
alwaysAI has a catalog of publicly available models that are guaranteed to work with any application built with our pipeline. You can also train your own object detection model using the alwaysAI model training toolkit, which is built using TensorFlow as the backend framework and does transfer learning on mobilnet_ssd, which was trained on the COCO dataset.
The model training toolkit enables users to start working with TensorFlow without learning a new API or installing any extra modules by hand. We also include the Computer Vision Annotation Tool (CVAT) as part of the toolkit, so users can annotate new datasets without needing to download and install CVAT from scratch.
We have our own Python API, called edgeiq, which wraps some of OpenCV’s core functionality and which can be used to interact with. This means you only need to know one API to get up and running with computer vision applications.
Since we ourselves love to use OpenCV, we’ve built it in as a core piece of our edge runtime environment, which means you can add `import cv2` to every alwaysAI application and use OpenCV directly in your app, regardless of your choice of deployment device.
We leverage containerization via Docker images and Python virtual environments, and this makes deploying your application simple no matter what your device is. This also makes it easy to add any necessary requirements your application may need, using either the Dockerfile or requirements.txt files.
Running computer vision on the edge saves both time and cost, and has the additional benefit of being more secure. While developing and deploying any computer vision application comes with its own set of challenges, this can be especially true when deploying on the edge.
Our main goal at alwaysAI is to overcome these challenges and make developing and deploying computer vision applications on the edge as easy as possible for users. You can try our platform yourself for free!