Skip to main content

The sound of AI: Voice recognition

By Michael Parks, PE, for Mouser Electronics

This is the fourth and final project in our series exploring the convergence of embedded electronics and artificial intelligence (AI). The possibilities of combining efficient, brain-like algorithms and inexpensive yet powerful microcontrollers and sensors are almost endless. Machine learning is at the forefront of the edge computing revolution. It represents a revolution in automation that promises billions of inexpensive embedded electronic systems interacting nearly instantaneously with the physical world without the need to reach back to cloud-based high-end servers.

We will be leveraging Google’s TensorFlow Lite for Microcontrollers software development kit (SDK). We will attempt to run an audio classifier neural network on an embedded system powered only by a 3V coin-cell battery for this project. To achieve that goal, we will employ the Giant Gecko GG11 development board from Silicon Labs, which is based on their EFM32 microcontroller family. This credit card-sized development board packs an Arm® Cortex®-M4-based EFM32 microcontroller and plenty of peripherals to help you go from idea to prototype quickly.

This project will look at the fundamentals of audio classifier machine-learning (ML) algorithms. Then, we will take a look at what is needed to get started developing custom ML applications for SiLabs’ Giant Gecko G11 development board and the featured EFM32 microcontroller. The future of low-power, high-performance edge computing begins now.

Project Materials and Resources

For this project, we will be using the Silicon Lab’s SLSTK3701A GG11 development board (Figure 1), a hardware platform for evaluating their EFM32 Giant Gecko S1 microcontroller based on a 72MHz ARM® Cortex®-M4 architecture. It is a feature-rich prototyping development platform with hardware that enables functionality such as dual microphones, cryptographic security hardware, predictive maintenance (PdM) sensor interface, Controller Area Network (CAN) bus interface, capacitive sensing, Ethernet, graphics engines LSENSE/PCNT enhancements useful for smart-energy meter applications.

Despite the abundant hardware features, the Giant Gecko is very energy efficient. Operating voltages range between 1.8V and 3.8V. The microcontroller draws 80μA/MHz in active mode and only 2.10μA while sleeping (with RTCC and RAM). It even features a CR2032 coin-cell battery holder to power the board for mobile applications.

 

Figure 1: SiLabs's EFM32 Giant Gecko S1 GG11 Development Kit. (Source: SiLabs)

We will leverage the EFM32 to build a device that incorporates the onboard microphones as an input device and the two onboard RGB LEDs as user feedback. We will toggle the LEDs’ colors in response to the user’s commands spoken to the device.

Bill of Material (BOM)

You can click this Mouser project share link to access the BOM along with current pricing. Table 1 lists the items in the BOM.

Table 1: TensorFlow Lite – SiLabs BOM

Quantity

Mouser P/N

Description

1

634-SLSTK3701A

Development Boards & Kits – SiLabs EFM32 Giant Gecko S1

1

685-MN35

Digital Multimeters MANUAL RANGE METER WITH TEMP PROBE

1

81-CR2032

CR2032 coin cell battery

1

562-3021055-03

USB 2.0 Mini Connector

Resources

All source files for this project are located on Mouser’s GitHub repository. The repository is divided into two main folders, one for hardware and the other for the firmware.

Hardware

The Hardware folder contains schematics of the EFM32 development board. Many external peripherals are onboard the development board for quick prototyping. (Figure 2)

 

Figure 2: Overview of the external components available on the GG11 development board. (Source: SiLabs)

Software

The Software folder contains the source code and the neural network model for the project. We will have to modify a few files with our custom implementation to interact with the EFM32 Giant Gecko S1 development board and associated components (RGB LED and microphone interface). The files provided include:

  • sls
  • c / .h
  • h
  • c / .h
  • cc
  • cc

More details about these files can be found in the Software section below.

Tools

This project assumes that you have access to the following tools:

  • Computer running Windows 10- (64-bit), macOS 10.14+, or Linux (Ubuntu 20.04 LTS) with the following hardware specifications:
    • CPU: 1GHz or better
    • Memory: 1GB minimum, 8GB recommended for wireless protocol development
    • Disk Space: 600MB minimum, 7GB for wireless dynamic protocol support
  • Internet connection
  • Digital multimeter (DMM)

Documentation

A plethora of additional resources and documentation are available regarding the Giant Gecko S1 series of development boards and the TensorFlow Lite for Microcontrollers Software Development Kit (SDK). Some of the recommended documentation to review includes:

A wealth of information on TensorFlow Lite, machine learning, and getting started with example models.

An overview of the TensorFlow library for Silicon Labs development boards.

The user guide for the Silicon Lab’s SLSTK3701A GG11 development board. It contains useful information about the onboard hardware components and how to interact with those features via software.

Another resource is chapter 7 of the TinyML Textbook by Pete Warden and Daniel Situnayake from O’Reilly Media. This chapter contains useful information regarding audio processing algorithms for machine-learning applications.

Peeking Under the Hood: Machine Learning Overview for Audio Applications

This project uses an audio recognition model that will leverage the development kit’s I2S functionality to interface with two microphone modules. The microphone will provide the inputs for the TensorFlow object detection model. Two onboard RGB LEDs will provide the user feedback.

In this section, we will break down the many phases of work needed to develop a neural network model, prepare the model to run on a low power platform (such as a microcontroller), and finally, how to utilize machine-learning inferencing on an edge device practically.

We group the steps into two phases. First, the training phase includes all the steps from collecting raw training data (in this case, an audio sample of the words “yes” and “no”) to uploading the model and firmware to an embedded device. The second phase is inferencing, which uses the neural network in a real-world setting.

Training Phase

The training phase is performed once and requires a lot of data and processing horsepower. Typically training on a desktop or even server-grade hardware depending on the abundance of sample data and the assumed complexity of the resulting neural network model. As more data is received, the model can be retrained to improve its performance and then redeployed to fielded edge devices via a firmware update.

  1. Gather training data: The first step in preparing the model is to collect tens of thousands of audio samples containing the keywords (again, “yes” and no”) in various ambient environments and speech mannerisms (such as different people with different dialects). Next, those audio samples must be saved into a two-dimensional array known as a spectrogram. A Fast Fourier Transform (FFT) is performed across 30ms chunks of the audio sample to create the spectrogram. Every six entries of the resultant 256 entries of the FFT are averaged together. This process results in 43 frequency buckets in each slice of the spectrogram with the signal amplitude results stored in unsigned 8-bit values with 0 representing 0 as a real number and 255 representing 127.5 as a real number. The model is then prepared for inferencing using the steps mentioned below.
  2. Train the neural network: The TensorFlow framework allows a model to be trained on a desktop computer or using the horsepower of cloud-based services such as AWS or Google Compute Engine. The bottom line, the more processing power (CPUs and GPUs) you throw at training a NN; the faster or more robust the final model will be. The result of training a model with TensorFlow is a file with a .pb extension. Downloading a pre-trained model is another option.
  3. Prepare the model for inferencing hardware: The next step is to take the trained model and prepare it to run on the chosen endpoint device. The model must be made small enough to fit within the memory of the EFM32 after conversion. Additionally, the code can only use the functions that are supported by TensorFlow Lite for Microcontrollers. However, it is possible to use a feature not currently supported if you write your own custom implementations.
    1. Convert the TensorFlow model to a TensorFlow Lite FlatBuffer: You will convert your model into the standard TensorFlow Lite format using the TensorFlow Lite converter. You might wish to output a quantized model because these are smaller and more efficient to execute.
    2. Convert the FlatBuffer to a C Byte Array: Models are kept in read-only program memory and stored in a C file. TensorFlow SDK provides tools that can be used to convert the FlatBuffer into the appropriate C byte array format.
  4. Integrate the TensorFlow Lite for Microcontrollers C++ library: Write the value-added code for our particular implementation that will allow the project to interact with the accelerometer chip to collect acceleration data, perform inferencing utilizing the TensorFlow Lite for Microcontroller C++ library, make a prediction, and then display the results to the end-user via the serial terminal.
  5. Deployment to the Edge: Build and deploy the program to your device using the IDE or command-line tools.

Inferencing Phase

The following steps are repeated continuously as part of the main loop of the project firmware. These steps represent an audio classifier neural network’s general logic regardless of whether it runs on a desktop computer or an embedded device. No internet access is required in either situation, as all inferencing is performed locally.

  1. Gather audio samples: The microphone continuously listens to the ambient environment, and the raw samples are sent to the model.
  2. Preprocesses: The raw audio samples are then processed to extract features suitable to feed into a model. This is done like what was done for building the model initially. See step 1 for additional information. The result is the previously discussed spectrogram.
  3. Inference: A spectrogram is a 2D array (Figure 3), and can be visualized graphically. Because we can represent a spectrogram graphically, it can be analyzed using similar machine-learning methodologies as are used in image analysis. The spectrogram is fed through a specific neural network known as a convolutional neural network (CNN). So, the same techniques that analyze groups of adjacent pixels in images to identify patterns and shapes can be used to identify specific spoken words by analyzing the audio sample’s visual representation in the form of a spectrogram.

 

Figure 3: A spectrogram is a visual interpretation of an audio signal. This makes it possible to analyze audio using techniques like those used for image identification. (Source: Google TensorFlow Project)

  1. Post processing: The inference is run many times per second. The software takes the results of each run, aggregates them, and decides whether, on average, a known keyword was detected. Each keyword of interest is assigned a percentage representing the certainty with which the model believes that word might have been spoken. The developer can further refine this processing and provide a mechanism to return only the keyword that received the highest percentage.
  2. Take action: The TensorFlow’s postprocessing results are handed over to a bit of code referred to as the command responder. The result, typically a string of text, is assessed against an if/then ladder of string comparisons (strcmp) to determine what action should be taken. Then, depending on the keyword detected, the microcontroller can manipulate various General Purpose Input/Output (GPIO) pins, allowing voice control over LEDs, motors, displays, buzzers, etc.

Building the project

This section will examine the necessary steps to get your project up and running from both a hardware and software perspective. We will also point out areas where you can customize the project to meet your specific needs. This project offers some rather excellent opportunities to tweak the design to satisfy your needs.

The section is broken down into the following subsections:

  1. Setting up the Toolchain
  2. Software development
  3. Deploy to the device

Setting up the Toolchain

Two software applications are needed for writing the firmware for this project. These include:

After the installer is downloaded, commence installing the Simplicity Studio IDE. Note that you will need to sign up for a free SiLabs account to download and install the application. The IDE supports the entire product line of SiLabs programmable products.

After the IDE is installed, it is time to install the Software Development Kit (SDK) specific for the Giant Gecko S1 microcontroller. SiLabs makes this relatively straightforward. Just make sure you have the GG11 development board and a USB cable handy.

 

Figure 4: After installing Simplicity Studio IDE, simply plug in the GG11 development board to install the SDK containing all required libraries and examples. (Source: SiLabs)

To install the necessary files to support your particular development board, it is a simple matter of plugging the USB cable into your computer and development board when notified to do so by the installer (Figure 4). Be sure to plug the micro USB end of the USB cable into the Debug USB Connector (found along the left side) of the GG11 development board.

After a few seconds, the installer should notice the SEGGER J-Link debugger onboard the GG11 development board (Figure 5). This port will also allow us to upload our firmware when we are ready.

 

Figure 5: Once the installer sees the development board, it will allow you to customize the SDK install. (Source: SiLabs)

After placing a check in the box for the development board, click Next. This will give us the option to either select the library packages we wish to install. There are two options, Auto and Advanced (Figure 6).

Figure 6: Select “Auto” to select the default packages required for the Simplicity Studio IDE to interact with the GG11 development board. (Source: SiLabs)

For this project, we will select the Auto option and then click Next. At this point, the installation will commence, and after a few minutes, you will be prompted to send a message that installation is complete (Figure 7).

Figure 7: You will be notified once all files have been installed. (Source: SiLabs)

Finally, click Close, and you will be prompted to restart Simplicity Studio (Figure 8).

 

Figure 8: Simplicity Studio must be restarted after installing the SDK. (Source: SiLabs)

Clicking Restart will close Simplicity Studio IDE and restart the application. Next, the IDE software will launch and ready for you to write custom firmware.

Software Development

When Simplicity Studio is launched for the first time, you will be presented with a familiar-looking development environment (Figure 9). There are four main components to the IDE. First, in the top left is the file explorer to navigate the project file structure. The largest window is the text editor screen. Right below the editor is the output screen and console window. Lastly, the bottom left window will display information for any SiLab development board plugged into the computer.

Figure 9: The main view of the Simplicity Studio IDE with an open project. (Source: SiLabs)

The codebase for this project is written in C/C++. Eight files of interest will be found in the Software folder of the GitHub repository project structure. Files with the .h extension are the header files that contain function definition and variable declarations. The files with the .cc extension contain the logic defined by the various classes and functions found within each source code file. The description of critical files are as follows:

  • sls: This is the overarching Simplicity Studio project file. It will be found in the /SimplicityStudio folder.

  • c / .h: These files contain the code needed to interact with and get audio samples from an I2S microphone. These files will be found in the Drivers/microphone folder.

  • c / .h: Enables the Direct Memory Access controller so the digitized microphone output can be stored in memory to improve the system’s performance. These files will be found in the Drivers/microphone folder.

  • h: Defines constants specific to the microphone found onboard the GG11 development board. The constants are used by the microphone.c/.h files. If a different development board is used, this file can be replaced with a different header file specific to that development board. This file will be found in the Drivers/microphone/config folder.

  • cc: This file handles the functionality needed to send the audio samples to the neural network for further processing. This file will be found in the /src folder.

  • cc: The files take the output of the neural network and interact with the onboard RGB LEDs. This file will be found in the /src folder.

To customize the code, we need first to download it to our computer. To do so, first ensure that Git is installed on your computer. After installing Git, clone the code from Mouser’s GitHub repository for this project. This can be accomplished by first launching the command prompt (Start>cmd on a Windows machine). Then navigate to the folder where you wish to download the code. Next, run the following command:

$ git clone https://github.com/Mouser-Electronics/TensorFlowLite-SiLabs

This will create a subfolder titled TensorFlowLite-SiLabs that contains the source code needed to interact with the GG11 development board. Now it is time to launch Simplicity Studio and import the code from the GitHub repository that has been locally cloned.

From the Simplicity Studio menu bar, select File>Import and navigate to the project directory with the tfl_silabs.sls project file. Click Ok. Ensure that all files are placed into the same folder structure as found in the GitHub repository–else you will get a compilation error.

Modifying the code

We will focus our edits on the command_responder.cc file, which will need modifications to support the target board and our desired functionality.

The microphone driver in the drivers/microphone subfolder uses the GG11’s onboard Universal Synchronous Asynchronous Receiver Transmitter (USART) and the Direct Memory Access (DMA) controller to sample audio I2S microphones. The digitized sample is then sent to the TensorFlow model via the audio_provider.cc source file. The TensorFlow code runs preprocessing, inferencing, and postprocessing on the audio samples. Lastly, the TensorFlow model inferencing results are handed over to the command_responder.cc file, which controls the onboard RGB LEDs.

Figure 10: Block diagram of the connection between the EFM32 microcontroller and the development board's RBG LEDs. (Source: SiLabs)

The GG11 development board has two user-accessible RGB LEDs marked LED0 and LED1. Each LED is controlled by three GPIO pins, one for each color–red, green, and blue. The LEDs are connected in an active-low configuration. To achieve the color red on LED0, the pin PH10 would be set to low while pins PH11 and PH12 are set to high (Figure 10).

To make this easier, a function in the board support package (BSP) library is called BSP_SetExtLed(int ledNo, int subLedNo), which will allow the code to set the individual pins of each RGB LED. The changes will be made in /src/command_responder.cc. The updated file will look something like this:

        #include <string.h*gt;

        #include "tensorflow/lite/micro/examples/micro_speech/command_responder.h"

        #include "bsp.h"

 

        // The default implementation writes out the name of the recognized command

        // to the error console. Real applications will want to take some custom

        // action instead, and should implement their own versions of this function.

        void RespondToCommand(tflite::ErrorReporter* error_reporter,

                       int32_t current_time, const char* found_command,

                       uint8_t score, bool is_new_command) {

               static bool initialized = false;

 

               uint32_t ledColor_Red = 1;

               uint32_t ledColor_Green = 2;

               uint32_t ledColor_Blue = 3;

               uint32_t ledColor_Off = 0;

 

               if(!initialized)

               {

                       initialized = true;

                       GPIO_PinModeSet(BSP_GPIO_LED0_PORT, BSP_GPIO_LED0_PIN, gpioModePushPull, 0);

                       GPIO_PinModeSet(BSP_GPIO_LED1_PORT, BSP_GPIO_LED1_PIN, gpioModePushPull, 0);

               }

 

        //error_reporter->Report("%s %d", found_command, score);

 

               if (is_new_command) {

               TF_LITE_REPORT_ERROR(error_reporter, "Heard %s (%d) @%dms", found_command,

                       score, current_time);

 

 

                       if(strcmp(found_command, "unknown") == 0)

                       {

                               BSP_ExtLedSet(0, ledColor_Red);

                               BSP_ExtLedSet(1, ledColor_red);

                       }

                       else if(strcmp(found_command, "no") == 0)

                       {

                               BSP_ExtLedSet(0, ledColor_Green);

                               BSP_ExtLedSet(1, ledColor_Red);

                       }

                       else if(strcmp(found_command, "yes") == 0)

                       {

                               BSP_ExtLedSet(0, ledColor_Red);

                               BSP_ExtLedSet(1, ledColor_Green);

                       }

                       else if(strcmp(found_command, "silence") == 0)

                       {

                               // LEDs Off, the multicolored LEDs on the GG11 work off negative logic

                               BSP_ExtLedSet(0, ledColor_Off);

                               BSP_ExtLedSet(1, ledColor_Off);

                       }

               }

        }

Deploying to the device

Once the code changes have been made, it is time to build and flash the project to the development board. The first step is to build the project. From within Simplicity Studio, from the menu bar, click on Project>Build Project (Figure 11).

Figure 11: The view within Simplicity Studio to start a project (Source: M. Parks)

Note any errors in the console located on the bottom of the IDE and make the appropriate fixes to the code. Next, we will flash the firmware to the device. To do so, click on the down arrow to the right of the bug-like icon from the toolbar.  Then click on Debug As and then select the appropriate project (Figure 12).

Figure 12: Once the project is built, flash it to the development board using the debug tool. (Source: M. Parks)

Lastly, to step debugging, from the menu bar, click Run>Terminate (Figure 13). Now you can return to editing the code.

Figure 13: Terminate the debugger once you are finished testing the code. (Source: M. Parks)

If you wish to run this project on a different SiLabs board, you will need to add the appropriate microphone and DMA drivers to the /drivers subfolder for that specific board. The audio_provider.cc and command_responder.cc files will also require modifications to support the other components (such as LEDs) of the target board.

Project in action

With the code uploaded onto the development board, unplug the USB cable from the board. Next, insert a CR2032 coin cell battery into the battery holder and ensure the battery’s positive side is facing up.

After a few seconds, the board should be initialized and ready to listen for commands. Speaking the word “yes” should turn LED0 red and LED1 green. Speaking the word “no” should turn LED0 green and LED1 red. If it hears silence, then both LEDs will be green. If it is unsure what word has been spoken, both LEDs will glow red (Figure 14).

 

Figure 14: If everything goes well, you will have an audio classifier neural network running on a coin-cell battery-powered microcontroller. (Source: M. Parks)

About the Author

Michael Parks, P.E. is the owner of Green Shoe Garage, a custom electronics design studio and technology consultancy located in Southern Maryland. He produces the S.T.E.A.M. Power podcast to help raise public awareness of technical and scientific matters. Michael is also a licensed Professional Engineer in the state of Maryland and holds a Master’s degree in systems engineering from Johns Hopkins University.

Profile Photo of Mike Parks