AI on the Edge with STM32

November 14, 2022

Mike Parks

By Michael Parks, PE, for Mouser Electronics

Published November 14, 2022

For decades, engineers and technicians have relied on “best guess” models built on stress testing components to plan and schedule preventative maintenance for complex systems. While nothing inherently wrong with this concept, it was still possible for premature breakdowns to occur (hardware failures often follow a so-called bathtub curve of failure likelihood over time). Conversely, it can be a waste of resources (financial, manpower, materials) to perform maintenance on systems that do not require any maintenance. Predictive maintenance (PdM) is changing that by making maintenance decision-making a data-driven process. Key to that are embedded devices driven by Machine Learning (ML) algorithms that can sense, process and communicate the indicators of health for any given system of interest. Though historically developing ML-based solutions have been a cumbersome process at best. STMicroelectronics (ST) is tackling that challenges that systems engineers face when developing predictive maintenance capabilities through unique hardware and software solutions.

First, their B-U585I-IOT02A Discovery Kit is a powerful development platform right out of the box. It features the STM32U585AI microcontroller, built around an ARM Cortex-M33 core, 2 Mbytes of Flash memory, and 786 Kbytes of SRAM. In addition, because this board is geared toward developers building Internet of Things (IoT) products, it hosts a bevy of sensors that can be easily accessed, including microphones, temperature and humidity, magnetometer, accelerometer, gyroscope, pressure, time-of-flight, and gesture-detection sensors. If you have custom hardware you wish to test and integrate; the dev board also features standard hardware interfaces such as the Arduino Uno V3 shields, STMod+, and Pmod. Finally, it wouldn’t be an IoT dev kit without the onboard Wi-Fi and Bluetooth modules.

Of course, hardware without software is like a vehicle without fuel. Developing ML software tailored to your STM32 hardware is straightforward thanks to the NanoEdge AI Studio platform, available for both Windows and Ubuntu Linux. Studio features a wizard-like interface that walks you through five steps needed to target the ML algorithms to the target STM32-based development kit. Data to train the model can be uploaded from a file or real-time from the target development board. It’s a flexible and straightforward way to train, build, and test anomaly detection or classification ML algorithms.

Project Materials and Resources

The project is centered around the STM B-U585I-IOT02A Discovery development kit. It has been designed specifically for industrial applications that need to leverage ML technology featuring numerous onboard sensors for convenient training of ML algorithms and running inferencing engine during deployment. We will leverage the onboard ISM330DHCX, an industrial-focused system-in-package 3D accelerometer and gyroscope that offers convenient digital output over a wide range of acceleration and angular acceleration values, very useful for the project that will be detailed in this article

Bill of Material (BOM)

You can click this Mouser project share link to access the BOM and the current pricing. As of the date this article was initially written, the BOM cost is about $65 (USD) before applicable taxes and shipping costs. Table 1 lists the items in the BOM.

Table 1: IOT2A NanoEdge AI Studio BOM

Quantity	Mouser P/N	Description
1	511-B-U585I-IOT02A	Development Boards & Kits - ARM Discovery kit for IoT node with STM32U5 series

Project Resources

All source files for this project are located in this GitHub repository. The repository is divided into two main folders:

Documentation

The documentation folder contains graphic files of schematics and other necessary reference materials.

Software

The Software folder contains the source code. More details about these files can be found in the Software section below. The two main directories are as follows:

iot2a_neai_training_application: Contains the codebase for an application that reads the accelerometer data and formats it in a manner that is ready for consumption by the ML training module (aka Signals) inside the NanoEdge AI Studio application.
iot2a_inferencing_application: This contains the inferencing engine code generated by the trained ML model. It can be used for a project that responds to the three-axis accelerometer data onboard the B-U585I-IOT02A Discovery development kit.

Tools

This project assumes that you have access to the following tools:

Computer with a high-speed Internet connection, running Windows 10, Windows 11, or Ubuntu Linux.

Building the Project

Industrial facilities rely heavily on fans for a variety of purposes. Keeping air moving is critical to maintain temperature, remove particulate matter from the air, remove odors, and to remove moisture to name a few. Failure of the motor or a fan blade can have severe consequences for industrial operations. This project will demonstrate how ML can be used to determine the operating status of a fan by detecting changes in the housing vibrations of the fan from a controlled baseline. The following section will examine the necessary steps to get your project up and running. This section is broken down into the following subsections:

Connecting the Development Kit to Development Computer
Setting up the Software Development Toolchain
Software Development: Training Application
Software Development: Inferencing Application
Project In Action

From a product engineer's perspective, developing ML-powered devices can be broken into three major steps. First, collecting training data and classifying it. Next, taking the training data and creating the neural network and benchmarking its performance, tweaking as necessary. Lastly, deploying the code to endpoint devices and inference, which is just a fancy way of saying using the ML model in the real world with live data not used for training or testing/benchmarking.

Connecting the Development Kit to Development Computer

The B-U585I-IOT02A Discovery development kit (Figure 1) has three USB interfaces for different purposes. First, a USB-C interface (CN1) for powering the board during normal operations. Second, a micro-USB connector (CN12) to update the onboard Bluetooth module. Lastly, a second micro-USB port (CN8) that allows us to program and debug the STM32U585AI microcontroller via the onboard STLINKV3E in-circuit programmer/debugger. No external hardware is needed!

Figure 1: The B-U585I-IOT02A Discovery development board contains numerous on-board sensors to develop ML applications. Programming and debugging are conducted over the USB port located along the right side of the board. (Source: Green Shoe Garage)

IMPORTANT NOTE FOR WINDOWS USERS: Before connecting the B-U585I-IOT02A Discovery development kit, it is necessary to download drivers from the ST website in order to interact via the STLINK-V3E. Failure to do so might result in the development board not being properly enumerated in the virtual COM ports list.

Once the drivers have been installed, plug the USB cable into the computer and the development board. Next, a micro-USB cable should be plugged into the micro-USB port along the right side of the development board (connector CN8). Finally, the power indicator LED and UART communications LED should ignite, and if on a Windows machine, a pop-up should announce the detection of the B-U585I-IOT02A Discovery development board attached to the system.

Setting up the Software Development Toolchain

The project consists of two software applications. The first is a C++ application that will serve as a datalogger to take the sensor data from the development board and send it to the training tools of the NanoEdge AI Studio. This application will be written using the ARM Keil Studio Cloud integrated development environment (IDE). A free account can be set up here. For this project, we first need to set up a data logger that reads the three axes of the accelerometer. Then, we will collect three different datasets. The first will show the fan in the off position. The second will show the fan in a failure mode induced by an object obstructing blade rotation. The third and final dataset will show the fan under normal operating conditions.

NanoEdge AI Studio is available for Windows 10 and 11 and Ubuntu Linux. As of the date this article was written, the current version of Studi0 3.0 and a 90-day free trial version can be downloaded from here. But, again, download the STLINK drivers first, as discussed in the previous section, "Connecting the Development Kit to Development Computer."

ARM Keil Studio Cloud (Figure 2) is a web-based IDE and should work in any modern browser. We used a Chromium-based browser with all the latest updates installed for this project.

Figure 2: The Keil Cloud IDE offers a convenient way to program your development board from anywhere in the world via a Chromium-based browser. (Source: Green Shoe Garage)

Software Development: Training Application

The codebase for this project is written in C++. First, we must create a small datalogger application that will send the accelerometer data from the development board to the NanoEdge AI Studio training software. For testing purposes, we will produce the following datasets for training:

fan_off: This data set contains the accelerometers' values while the development board is connected to the fan and the fan is not running. The accelerometers will output X, Y, and Z acceleration readings at or near 0g.
fan_normal: This data set contains the accelerometers' values while the fan is operating in a normal manner. The X, Y, and Z acceleration readings will be fairly consistent, with a gradual ramp up and ramp down as the fan turns on and turn off.
fan_failure: This data set contains the accelerometers' values that were recorded while inducing a mechanical failure caused by preventing the fan from rotating normally. The X, Y, and Z acceleration values will change randomly and quickly.

Of course, if you wanted more nuanced outputs (a.k.a. signal classes), you could create additional training datasets. However, for simplicity of explanation, we are keeping with these three states.

Before accessing the onboard ISM330DHCX accelerometer, we have to download the drivers from the ST GitHub repository. Unfortunately, it is currently unavailable in the standard board support package (BSP) folder for the B-U585I-IOT02A Discovery development board. To clone or download the ISM330DHCX library, click here.

AN IMPORTANT NOTE ON PREPARING SIGNAL FILES: Correctly capturing the signal generated by the phenomenon of interest is crucial to training a reliable ML model. A signal file is made by discretely sampling the continuous sensors signal over a short, finite span of time. Recall that the Nyquist criterion dictates that a signal be sampled at twice the rate of the highest frequency component of the signal of interest. Failure to do so could result in false positive or negatives due to the aliasing effect. For more information on collecting training data and formatting the training and test data files, please visit AI:NanoEdge AI Studio - stm32mcu

Key Files

The critical files for this project include:

main.cpp: Contains the code that generates the UI and contains the callback code that triggers when an event occurs (such as screen open, button pressed, etc.)
ism330dhcx.c/.h: These are the driver files for the onboard ISM330DHCX accelerometer. The functions contained within these files allow for the developer to manipulate accelerometer settings and to read the acceleration values while the device is in operation.

Key Variables and Constants

You might want to be aware of a few variables and possibly tweak them depending on your particular design choices. These variables can be found in the file titled main.cpp:

ISM330DHCX_Object_t ism330dhcx_object: An object that allows us to communicate with the onboard accelerometer.
ISM330DHCX_AxesRaw_t acc_values: An enumerated data type that consists of three uint_16_t values for the x, y, and z axes.
float x, y, z: A place to store the accelerometer values locally within main.cpp.
static int32_t ISM330DHCX_Probe(void): Mechanism to get and set settings of the onboard accelerometer.

Key Functions

The main.cpp file contains the initialization code, the main loop, and support functions for the project. The support functions include:

ISM330DHCX_ACC_Enable(&ism330dhcx_object): Turns the accelerometer hardware on. Failure to do so will result in garbage data. Must pass a pointer to the ism330dhcx object.
ISM330DHCX_ACC_GetAxesRaw(&ism330dhcx_object, &acc_values): Mechanism to request the accelerometer data. Must pass it a pointer to the ism330dhcx object and a point to the enum data structure to store the x, y, and z axes data.

Software Development: Inferencing Application

Now that the data logger application is complete, it's time to develop the inferencing application that will detect the real-time motion or no motion onboard the development board. The codebase for this part of the project is also written in C++. This is where the NanoEdge AI Studio (Figure 4) shines. It presents a straightforward, five-step process to build, test, and deploy a function inferencing engine for any supported STM32 hardware platforms and sensors. A quick glance at the five steps are as follows:

Project Settings: Declare which development board and sensors you intend to utilize.
Signals: Collect model training data from a file or streamed via a serial connection directly from the development board.
Benchmark: Create the ML model and test its accuracy and memory footprint.
Emulator: Test the model using independent test data.
Deployment: Create the inferencing application that can run on your target device.

Figure 3: NanoEdge Studio AI isa powerful tool to train, test, and deploy ML algorithms to embedded devices. (Source: ST)

To begin, let's launch the Studio application on our development machine. This will present us with four types of ML algorithms we can build. They are:

Anomaly Detection: Used for detecting anomalies in data using a dynamic model.
1-Class Classification: Used for detecting anomalies in data using a static model.
N-Class Classification: Used for distinguishing among n different states using a static model.
Extrapolation: Used for estimating an unknown target value using other known parameters, using a static model.

For this project, we will select the n-class classification model. Next, select Create New Project. This will begin the workflow, starting with the Project Settings step.

Figure 4: Training an ML model begins with selecting the target devices and preferred sensors. (Source: ST)

Project Settings

The first step in the process is to select our target development board and from which onboard sensors we will collect data (Figure 5). The following selections should be made:

Name: Assign a title for this particular project.
Target: The target development board, in this case, selects the Disco-B-U585I-IOT02A.
Sensor Type: Select Accelerometer 3 axes.

Figure 5: NanoEdge AI Studio provides multiple ways to get training into the ML model training engine. Here the data is being streamed real-time from the Discovery kit development board via USB serial. (Source: Green Shoe Garage)

Signals

The second step in developing the ML model is to collect real-world training data from the chosen development board (Figure 6). To do so, we will follow these steps:

Click on ADD SIGNAL.
Select FROM SERIAL (USB).
Select the correct COM port and baud rate, in this case, 9600bps.
Place a checkmark in the Maximum number of lines checkbox and enter 100 lines.
Click on the START/STOP button. Wait for all 100 data points to be collected. You should see the data coming in via the Serial output text box and charted graphically.
Click CONTINUE to preview the data.
Ensure the delimiter type is space.
Click on IMPORT.
Give the dataset a name in the name text box (e.g., no_motion, at_rest, etc.)
Click on RUN OPTIONAL CHECKS to have Studio verify that there are no common errors in the dataset.

Figure 6: : NanoEdge AI Studio provides a benchmarking tool to determine the accuracy of the ML model as well as its memory footprint. (Source: Green Shoe Garage)

Benchmark

Next, Studio will generate a neural network model and then run a benchmark against the datasets to see how well the model tracks against the training data (Figure 7). It will report an accuracy score, a confidence score, and how much RAM and flash memory the model will consume on the target board. To run the benchmark, do the following:

Click RUN NEW BENCHMARK.
Place a checkmark in each of the signal classes you want to benchmark.
Use the slider to adjust the number of CPU cores you wish to dedicate to the benchmark. The more cores, the faster the benchmarking, but it might make your machine sluggish, depending on the size of the dataset.

Figure 7:: To verify the performance of the ML model, Studio provides both a browser-based emulator and an emulator that can be ran locally to verify performance against test data vice the training data. (Source: Green Shoe Garage)

Emulator

Now that we have a working model, we can run an emulator to test that model against new testing data to see how it performs against datasets not used for training (Figure 8). In short, we can see how well the model works in the real (albeit emulated) world. Again, test data can be uploaded as a text file or streamed in real-time from the target development board via USB. To emulate the ML model, follow these steps:

Select the benchmark you wish to emulate.
Click INITIALIZE EMULATOR. Alternatively, you can run the emulator locally on either a Linux or Windows machine. Next, select the emulator appropriate for your computer's operating system.
Select the correct COM port and baud rate (again, 9600bps for our code).
Click the START/STOP button.

Figure 8:: Once a developer is satisfied with the performance of the ML inferencing engine, the model can be deployed to target hardware with just a few mouse clicks. (Source: ST)

Deployment

The final step is to generate the firmware that can be run on the target board for inferencing on the actual target hardware (Figure 9). Again, there are a few options (compilation flags) that can be selected by placing a check in a checkbox. Those options are:

Multi-library: Check the box if you want to integrate more than one NanoEdge AI library in your program. Then, choose a suffix for each library.
Float abi: Specifying soft causes GCC to generate output containing library calls for floating-point operations. "hard" allows the generation of floating-point instructions and uses FPU-specific calling conventions.
fshort-wchar: The "-fshort-wchar" option can improve memory usage, but might reduce performance because narrow memory accesses can be less efficient than full register-width accesses. The default is "-fno-short-wchar." That has the default size of wchar_t at 4 bytes.
fshort-enums: The "-fshort-enums" option can improve memory usage but might reduce performance because narrow memory accesses can be less efficient than full register-width accesses. The default is -fno-short-enums. That is, the size of an enumeration type is at least 32 bits regardless of the size of the enumerator values.

We will leave the options alone for now. When ready to generate the inferencing engine code, follow these steps:

Select which benchmark you wish to generate the code against, by selecting the appropriate benchmark from the dropdown menu.
Click COMPILE LIBRARY to generate the code and download the .ZIP file containing the inferencing code.
Copy the "hello world" application generated and paste it into your main.cpp inferencing application file.

Project in Action

With the project assembled and firmware installed, it's time to verify the operation of the inferencing engine. Plug the USB cable into the device as you did for programming and debugging earlier. Set the board on the lab bench and fire up a serial terminal such as PuTTY. Ensure that the development board is placed on the fan at the same location the datalogger was connected for collecting training datasets. Failure to do so could result in false positives and negatives. As a general suggestion, placing the accelerometer slightly off center axis of rotation seems to result in the best results.

Ensure that the correct COM port is chosen and set the baud rate to 9600 bps. Data from the inferencing engine will begin to stream over the serial port within a few seconds. The critical values to look out for include:

class_name: The name of the class (e.g., in_motion, at_rest) that the inferencing engine believes is currently happening.
class_status: The numeric identifier of the class.
class_probability: The probabilities of all enumerated classes. Use the class_status value as the index into the list of possible probabilities to determine the probability of each class.

For this example, we allow the fan to operate normally, and the inferencing engine determines that the accelerometer attached to the fan is detecting normal motion and gives us a probability of 99.97% chance that the correct class has been selected (Figure 10). We have also chosen to light a user-definable LED (attached to GPIO_Pin_7) when the most likely class (above 80% probability) is the fan_normal class.

Figure 9:: The accelerometer inferencing engine in action. Streaming the output of the inferencing engine via USB serial is a convenient way to verify performance and debug your application. (Source: Green Shoe Garage)

Armed with these new insights into our fan’s health we can begin to establish proactive maintenance procedures that are triggered when the system sends notifications that it is detecting aberrant behaviors. In our example (Figure 11), that could be as simple as dispatch a technician if the fan is in the fan_failure state for more than ten minutes. Of course, there are numerous other use cases for such ML-enabled PdM. Consider the following examples:

Measuring current through a breaker to analyze current draw as a part of machinery health analysis or an energy management program.
Sensing and analyzing fluid flow rates in a piping system to detect blockages that could lead to costly environmental remediation if there were to be a spill.
Detecting the anomalous temperature, humidity and vibrations within a refrigerator would be very useful to predict imminent failure. This could help save significant money for restaurants by avoiding spoilages.

Figure 10:: Experiment with location of the accelerometer to optimize results. (L) Placing dev board on top of fan motor housing. (R) Placing dev board on face fan cage. Placing dev board on the motor housing seemed to work best. (Source: Green Shoe Garage)

Predictive maintenance, powered by smart technology solutions like STM's IoT02A Discovery Kit and their NanoEdge AI Studio is changing how we plan, inspect, maintain, and repair industrial machinery. Reducing downtime while also optimizing resource utilization (technician’s time, money, materials) means industrial operations can “have their cake and eat it too”. Maintenance does not have to be reactive or based on statistical models. Nor does maintenance need to be done “just to be safe.” Machine learning algorithms, when couples with low-cost, yet powerful embedded systems can truly deliver us a world of smart, predictable maintenance. And that world is here, today. For more information on STM’s IoT02A Discovery Kit and their NanoEdge AI Studio, please visit here. B-U585I-IOT02A Discovery Kit - STMicro | Mouser