Skip to main content

Deep Learning on Embedded Platforms

Becks Simpson for Mouser Electronics

As machine vision continues to advance, powered by deep learning (DL) methods, a world of possibilities emerges in the domains of automation and defect detection. Transformative potential comes from integrating DL into embedded systems, especially in scenarios involving robotics, autonomous vehicles, and industrial processes. From autonomous activities like smart doorbells to complex tasks like defect detection using drones, incorporating more sophisticated machine vision methods enhances the intelligence of these smart devices beyond what was previously possible. However, this fusion comes with its own set of challenges, encompassing latency, power efficiency, security, and privacy. Navigating these requirements means careful selection of appropriate algorithms, adaptations for embedded systems, and deployment strategies that pave the way for a smarter and more responsive world.


Using DL in Embedded Machine Vision


As machine vision improves and DL-based methods make strides into automation and defect detection processes, many applications stand to benefit from integrating DL. Industrial and home automation processes (especially those that include arms, vehicles, or other robotic elements) benefit from the ability to view, understand, and respond to their environments to complete a series of actions. Since the environments in which these smart devices and embedded systems need to operate are complex and highly variable, the use of DL for machine vision is increasingly necessary. This enhanced intelligence can be used to accomplish autonomous activities like delivery, grounds maintenance, smart doorbells and building access, vacuuming, manufacturing, and warehouse packing. A subset of these applications also involves monitoring and defect detection. Applications like detecting defects in pipes, powerlines, telecommunications towers, and other assets using drones or other small autonomous vehicles can also be improved with the addition of DL-based machine vision. With enough data, these models are less fragile and perform more robustly under different conditions than traditional machine vision models.

However, introducing DL to machine vision processes is costly and has several considerations that impact the computing and hardware requirements for success. Because these applications need to detect, classify, and respond to defects or other objects of interest within the automation process in real time, the time to send data over a network to get results is often limited. Hence, the hardware chosen needs to have low latency of DL inference once a model is deployed to the device. Additionally, depending on the application, power efficiency may also be a consideration, especially for autonomous vehicles like drones or rovers that perform inspections using their own power sources. The longer they can work without needing to be charged, the more efficiently they can do their job. Since DL processes require heavy-duty compute functions, it's key to ensure that they run efficiently and that the hardware used is optimized for low power consumption.

Embedded systems, like the NXP Semiconductors i.MX 8M Plus evaluation kit, are specially designed with these types of requirements in mind, incorporating powerful artificial intelligence acceleration capabilities into the design without compromising power efficiency. Other requirements that are important in these applications are security and privacy. For example, in home automation applications or for defect detection in assets, often a need exists to keep the data on the device (and not shared to the cloud) or to ensure that data and communications over networks are fully encrypted. Lastly, where these algorithms are expected to run in real time on an autonomous robot or vehicle, straightforward integration of good-quality, low-latency, robust sensor data for vision is also vital, and the choice of hardware should reflect that.


Choosing Algorithms for Embedded Systems


The choice of DL algorithms for these machine vision applications is heavily influenced by the requirements mentioned. Designers need to pay particular attention to model size on disk, memory, and latency of inference. For anything involving object detection from a camera stream (e.g., defects, people, cars, animals), a pretrained or “foundation” model likely exists with a broad understanding of most visual input. These are open-source models exposed to a vast quantity of data and made available for extension to more nuanced applications. However, when a desired task differs significantly from those used to train the model, testing the performance is important. For example, the task of identifying defects on power lines or inside pipes involves different objects and data than the task of identifying people or cars.

Designers should also ensure the pretrained model architecture is optimized for embedded devices. Specifically, one should choose a model architecture that is small enough to live on the device—aiming for a model that is megabytes (not gigabytes) in size. Additionally, loading and using the model should not be prohibitively slow, which is often the case for bigger models. Specific techniques typically have been used to optimize these architectures and model weights; for example, weight pruning or quantization helps to reduce size and complexity while retaining performance.

Choosing an embedded DL vision model also requires ensuring that the format is relatively standard (such as TensorFlow Lite) so that converting to the needs of the device is possible and straightforward. Fortunately, the breadth of open-source vision models available means that having to design and train a DL model from scratch is largely a thing of the past.


Adapting Models to Given Applications


When assessing an embedded model’s suitability for use in machine vision, the performance may not be ideal for the specific application. This may be due to differences in the original training data distribution— such as acquisition device and conditions—compared to the real-world data expected from the use case. Additionally, the foundation models may exist for the application, but the types of objects they can detect don't entirely match what is needed. For example, standard models that perform object detection may not have specific classes for defects or uncommon objects of interest. In both cases, designers should adapt the model via fine-tuning or retraining to improve performance or adapt the classes needed for the desired task. Doing this requires new data labeled according to the use case’s needs for recognition and classification. These data are in the form of open-source datasets that are similar to or aligned with the domain-expected data or custom data acquired from the real assets and labeled as needed. Finding a suitable data labeling platform, like Labelbox or Label Studio, makes the process easy and will keep annotations in a standard format for DL. Once the updated model performs to the level required, it is ready to be deployed.


Deploying the Embedded Model


With the wealth of available model formats for embedded systems, it's important to choose hardware with integrated software that makes model porting easy. The first step when porting a model is to write any code required for data preprocessing or to leverage an embedded system that already includes this for image processing applications. For example, the NXP i.MX 8M Plus evaluation kit works with NXP eIQ® machine learning software, which includes the machine vision preprocessing pipeline as part of the DL software deployed onto the board. These preprocessing steps often include converting the raw pixel stream to RGB and resizing or cropping frames. Equally, after model inference, some post-processing may be required, such as overlaying the bounding box on a camera input for a viewing device like an LCD screen.

The next step is to convert the model to the required format so that it can be run on the embedded device to provide real-time inference. This typically involves converting the model to a standard format like Open Neural Network Exchange (ONNX), which makes it easier to access hardware optimization. Next, the model and adjacent code are bundled into a machine-executable binary for the target device before generating a bundle. Several open-source compilers exist for this task, such as Glow by Facebook. The output of this step can be deployed to the embedded system used for the given application. Because the process of porting a model is complex, involving many sub-steps of the parts mentioned, choosing an embedded platform that comes with software to make the porting process faster and simpler is often recommended. For example, the NXP i.MX 8M Plus evaluation kit is programmable using the eIQ software package, which handles all of these steps in a matter of clicks. Once the device has the model inference engine running, its outputs can be used in the desired application—defect detection, industrial automation, or others.


Conclusion


Integrating deep learning into computer vision for embedded systems opens a realm of possibilities that are more complex and innovative than those with traditional methods alone. The potential benefits span home and industrial automation as well as defect detection across various domains. Nevertheless, challenges like latency, power efficiency, security, and privacy necessitate careful consideration, particularly when selecting hardware. Algorithm selection and adaptation are also essential steps in the process to ensure optimized performance for the given application. Even deployment of embedded models is a meticulous process that requires suitable hardware and software to enable a streamlined conversion. Ultimately, this convergence fosters a new era of efficiency and innovation for embedded applications that require computer vision.

About the Author

Becks is a Machine Learning Lead at AlleyCorp Nord where developers, product designers and ML specialists work alongside clients to bring their AI product dreams to life. She has worked across the spectrum in deep learning and machine learning from investigating novel deep learning methods and applying research directly for solving real world problems to architecting pipelines and platforms to train and deploy AI models in the wild and advising startups on their AI and data strategies.

Profile Photo of Becks Simpson