Build a Voice Assistant with TensorFlow Lite
Image Source: videotrinkets/shutterstock.com
By Michael Parks, PE, for Mouser Electronics
Introduction
Welcome to the third of our four-part project series examining the convergence of embedded electronics and artificial intelligence (AI). In this part, we will explore using Google’s TensorFlow Lite for Microcontrollers software development kit (SDK) to build our own voice-controlled assistant. Digital voice assistants are increasingly common in homes and workplaces. However, the potential security concerns cause many to be skeptical. Machine-learning algorithms running on low-cost embedded hardware can give users the convenience of voice-control without security worries. In this project, we look at building a voice assistant with no need for an internet connection and can be used to control various types of actuators directly.
Project Materials and Resources
For this chapter of our TensorFlow Lite for Microcontrollers series, we will be using the Infineon XMC4700 Relax Kit (Figure 1), a hardware platform for evaluating Infineon's XMC4700-F144 microcontroller based on ARM® Cortex®-M4 @ 144MHz, 2MB Flash and 352KB RAM. The board features an Arduino Uno shield-compatible header layout and can interact with 3.3V-tolerant shields to add functionality quickly. Note, a 5V-tolerant version of this board is available if you desire to interact with Arduino Uno shields that have a 5V operating voltage.
Figure 1: Infineon's XMC4700 Relax Development Kit. (Source: Mouser)
We will leverage the XMC4700 to build a device that incorporates a microphone as an input device and the two onboard LEDs as user feedback. We will toggle the LEDs on and off in response to commands that are spoken to the device by the user.
Bill of Material (BOM)
Click this Mouser project share link to access the bill of materials along with current pricing. Table 1 lists the items in the BOM.
|
Quantity |
Mouser P/N |
Description |
|---|---|---|
|
1 |
726-KITXMC47RELAXV1 |
Development Boards & kits — Infineon XMC4700 Relax Kit |
|
1 |
726-S2GOMEMSMICIM69D |
Audio IC Development Tools Board — IM69D130 Microphone Shield2Go |
|
1 |
992-6FX1L-254MM |
Headers & Wire Housings 2.54mm (0.1") 6-pin wire wrap fem header |
|
1 |
992-8FX1L-254MM |
Headers & Wire Housings 2.54mm (0.1") 8-pin wire wrap fem header |
|
1 |
474-PRT-11376 |
Headers & Wire Housings 2.54mm (0.1") 10-pin wire wrap fem header |
|
1 |
854-ZW-MM-10 |
Jumper Wires ZipWire Male-Male 40 Unzipp Wires x 10cm |
Resources
All source files for this project are located on Mouser's GitHub repository. The repository is divided into two main folders:
Hardware
The Hardware folder contains schematics of the XMC4700 development board and the IM69D130 Microphone Shield2Go (Figure 2).
Figure 2: Infineon's IM69D130 Microphone Shield2Go. (Source: Mouser)
Software
The Software folder contains the source code and the neural network model for the project. We will have to modify replace a few files with our custom implementation to interact with XMC4700 Relax Kit development board and associated components (LED and microphone interface). The files provided are:
- main.cc
- main_functions.cc / .h
- audio_provider.cc / .h
- command_responder.cc / .h
- feature_provider.cc / .h
- recognize_commands.cc / .h
- no_1000ms_sample_data.cc / .h
- no_30ms_sample_data.cc / .h
- yes_1000ms_sample_data.cc / .h
- yes_30ms_sample_data.cc / .h
More details about these files can be found in the Software section below.
Tools
This project assumes that you have access to the following tools:
- Computer running Windows, Mac OS X, or Linux
- Internet connection
- Digital Multimeter (DMM)
- Soldering Iron (and associated soldering consumables)
Documentation
A plethora of additional resources and documentation are available regarding the Relax Kit series of development boards as well as the TensorFlow Lite for Microcontrollers Software Development Kit (SDK). Some of the recommended documentation to review:
- https://www.tensorflow.org/lite/microcontrollers
A wealth of information on TensorFlow Lite, machine learning, and getting started with the example models. - https://www.infineon.com/dgdl/Infineon-IM69D130_Microphone_Shield2Go-GS-v01_00-EN.pdf?fileId=5546d462677d0f4601677f3486ed0941
A quick start guide on how to use the IM69D130 microphone shield with the XMC4700 Relax Kit and the Arduino IDE. - https://www.infineon.com/cms/en/product/evaluation-boards/kit_xmc47_relax_v1/
This website contains links to various resources on developing for the XMC4700 Relax Kit.
Machine Learning Technology Overview
The XMC4700 Relax Kit Development Board features an XMC4700 microcontroller built on the robust ARM Cortex-M4 architecture. The Relax Kit development kit gives a developer access to additional functionality, including a microSD-card slot, a Controller Area Network (CAN) transceiver, ethernet, and an onboard J-Link debug probe over USB. The debugger port supports both Serial Wire Debug (SWD) and a UART-to-USB bridge (virtual COM port).
Google offers several pre-trained neural network models that will work with the TensorFlow Lite for Microcontroller SDK (Figure 3). They are available here.
Figure 3: Google's TensorFlow Lite for Microcontroller website has many resources and pre-trained models ready for use. (Source: Google)
This project will go with the person detection example. This is an audio recognition model that will leverage the development kit's I2S functionality to interface with the IM69D130 microphone module. The microphone will provide the inputs for the TensorFlow object detection model.
Figure 4: Sampling audio files to create the NN models first requires a Fast Fourier Transform (FFT) to create a spectrogram. (Source: Google)
The first step in preparing the model is to convert audio sample files of the keywords into a two-dimensional array known as a spectrogram. To create the spectrogram, a Fast Fourier Transform (FFT) is performed across 30ms chunks of the audio sample. Every six entries of the resultant 256 entries of the FFT are averaged together. This process results in 43 frequency buckets in each slice of the spectrogram (Figure 4) with the results stored in unsigned 8-bit values with 0 representing 0 as a real number and 255 representing 127.5 as real-number. The model is then prepared for inferencing using the steps mentioned below.
To deploy a TensorFlow model to a microcontroller, you will need to follow this development workflow to train the Neural Network (NN) model then convert it to a format that can work on a resource-constrained device such as a microcontroller.
- Train the Neural Network: The TensorFlow framework allows a model to be trained on a desktop computer or using the horsepower of cloud-based services such as AWS or Google Compute Engine. The bottom line: The more processing power (CPUs and GPUs) you throw at training a NN, the faster or more robust the final model will be. The result of training a model with TensorFlow is a file with a .pb extension. Downloading a pre-trained model is another option.
- Prepare the Model for Inferencing Hardware: The next step is to take the trained model and prepare it to run on the chosen endpoint device. The model must be made small enough to fit within the memory of the XMC4700 after conversion. Additionally, the code can only use the functions that are supported by TensorFlow Lite for Microcontrollers. However, it is possible to use a feature that is not currently supported if you write your own custom implementations.
- Convert the TensorFlow model to a TensorFlow Lite FlatBuffer: You will convert your model into the standard TensorFlow Lite format using the TensorFlow Lite converter. You might wish to output a quantized model because these are smaller in size and more efficient to execute.
- Convert the FlatBuffer to a C Byte Array: Models are kept in read-only program memory and stored in a C file. TensorFlow SDK provides tools that can be used to convert the FlatBuffer into the appropriate C byte array format.
- Integrate the TensorFlow Lite for Microcontrollers C++ Library: Write the value-added code for our particular implementation that will allow the project to interact with the accelerometer chip to collect acceleration data, perform inferencing utilizing the TensorFlow Lite for Microcontroller C++ library, make a prediction, and then display the results to the end-user via the serial terminal.
- Deployment to the Edge: Build and deploy the program to your device using the Arduino IDE.
Building the Project
In this section, we will examine the necessary steps to get your project up and running from both a hardware and software perspective. We will also point out areas where you can customize the project to meet your specific needs. This project offers some rather excellent opportunities to tweak the design to satisfy your particular requirements.
The section is broken down into the following subsections:
- Hardware Modifications
- Setting up the Toolchain
- Software Development
Figure 5: Pinout of the XMC4700 Relax Kit development board. (Source: Infineon)
Hardware Modifications
The XMC4700 Relax Kit (Figure 5) supports the capability to read an audio signal from the IM69D130 microphone shield via the I2S (also referred to as IIS, short for Inter-IC Sound) synchronous serial communication protocol for digital audio devices. One caveat: the current I2S library for XMC devices only supports audio inputs abilities; the ability to output audio signals is currently not supported.
The steps needed to complete the hardware modifications are as follows:
- Solder on four headers to the XMC4700 Relax Kit–1x 6-pin header, 2x 8-pin headers, and 1x 10-pin header.
- Solder two 8-pin headers to the IM69D130 Microphone Shield2Go board (Figure 6). The wiring connections between the XMC4700 Relax Kit and
- Wire the XMC4700 to the IM69D130 microphone Shield2Go board, as described in Table2.
Figure 6: Pinout of the IM69D130 microphone Shield2Go. (Source: Infineon)
Table 2: XMC4700 Wiring Connections
|
XMC4700 Relax Kit (Arduino Pin ID) |
IM69D130 Microphone Shield2Go |
|
BCLK -- P3.10 (D10) |
BCLK |
|
DATA -- P3.7 (D12) |
DATA |
|
CLK -- P3.9 (D13) |
CLK |
|
GND |
GND |
|
3.3V |
3.3V |
The XMC4700 (Figure 7) has two user-controlled onboard LEDs that we will use to provide feedback to the user. The LEDs are tied to the microcontrollers GPIO pins as follows:
-
LED1 is tied to P5.9, which can be interacted with as GPIO pin D24 via the Arduino IDE.
-
LED2 is tied to P5.8, which can be interacted with as GPIO pin D25 via the Arduino IDE.
Figure 7: Key components of the XMC4700 Relax Kit, including two user-controlled LEDs. (Source: Infineon)
Setting up the Toolchain
When it comes to development environments for writing the firmware needed for this project, you can choose these options:
For this article, we will explore how to set up and utilize the Arduino IDE to program the XMC4700 Relax Kit. To begin, let's grab the latest version of the Arduino IDE from this page on the Arduino website.
Figure 8: If the XMC library is installed correctly, there should be an option for the XMC4700 under Tools > Boards > XMC Family. (Source: Infineon)
After the installer is downloaded, run the installer, and then launch the IDE. The first thing that we need to do to set up the IDE is to tell the software where to download the support files and libraries to program the XMC4700 Relax Kit. Those instructions can be found here. If successful, it will be possible to select the XMC4600 Relax Kit from Tools > Board > XMC Family (Figure 8).
Next, it is necessary to grab the libraries needed to interact with the IM69D130 Microphone Shield2Go. To do so, ensure that you have Git installed and then run the following command from the ~/Arduino/libraries directory to clone the IM69D140 library:
$ git clone https://github.com/Infineon/IM69D130-Microphone-Shield2Go/
This will create a subfolder titled IM69D130-Microphone-Shield2Go that contains the source code and examples needed to interact with the microphone.
Lastly, to use the onboard debugger and programmer of the XMC4700, we need to install a piece of software called SEGGER J-Link. Click here and look for the link to the J-Link Software and Documentation Pack for your particular operating system.
Software Development
The codebase for this project is written in C/C++. Nineteen files of interest will be found in the Software folder of the project structure of the GitHub repository. Files with the .h extension are the header files that contain function definition and variable declarations. The files with the .cc extension contain the logic as defined by the various classes and functions found within each source code file. The list of critical files includes:
-
main.cc: Contains the main loop that simply calls the setup and loop functions that are defined in main_functions.cc and main_functions.h. This is done to make the rest of the code base Arduino compatible.
-
main_functions.cc/.h*: Responsible for setting up parameters for the TensorFlow Lite model as well as the main loop that will run continuously listening for a user to speak the words "yes" or "no" and call the functions to respond accordingly.
-
audio_provider.cc/.h*: Interacts with the microphone and provides audio samples back to the TensorFlow Lite model.
-
command_responder.cc/.h*: Responds to any commands the TensorFlow Lite model determines the user spoken by toggling the two LEDs on and off as appropriate.
-
feature_prodiver.cc/.h: Contains code that prepares the audio samples for analysis by the neural network.
-
recognize_commands.cc/.h: Analyzes the audio sample and determines if the end-user spoke a command.
The next four pairs of files contain the sample data used by the TensorFlow Lite neural network to compare against the output of the microphone. There is a long and short duration model for the word "yes," and the word "no."
-
no_1000ms_sample_data.cc/.h
-
no_30ms_sample_data.cc/.h
-
yes_1000ms_sample_data.cc/.h
-
yes_30ms_sample_data.cc/.h
These files have been replaced or modified with custom written code for the XMC4700 development board to handle the IM69D130 microphone Shield2Go and GPIO pins connected to the user-controlled LEDs.
Key Variables and Constants
You might want to tweak a few variables depending on your particular design choices:
- constexpr int kTensorArenaSize = 10 * 1024: Creates an area of memory to use for input, output, and intermediate arrays. The size of this variable will depend on the model used, and the correct value for your particular application can be determined through experimentation. This variable is found in cc.
- constexpr int yes_led = 24 and constexpr int no_led = 25: Define the pins connected to the LEDs. These can be changed if you wish to use different devices as outputs. Just remember that D10, D12, and D13 are reserved for the I2S interface.
A note about the constant expression keyword (constexpr), which is used to define the variable a constant but does so at compile-time versus runtime as with the const keyword.
Modified Code
Some of the files have been modified to account for the project-specific functionality we wish to add as well as tailoring for the XMC4700 development board. The following files will need to be modified:
- audio_provider.cc/.h: This file will need to be edited to use the I2S audio interface for the XMC4700 and IM69D130 Microphone Shield2Go. Specifically, replace the code for the Pulse-Density Modulation (PDM) implementation with the I2S library.
void CaptureSamples() {
// This is how many bytes of new data we have each time this is called
const int number_of_samples = DEFAULT_PDM_BUFFER_SIZE;
// Calculate what timestamp the last audio sample represents
const int32_t time_in_ms =
g_latest_audio_timestamp +
(number_of_samples / (kAudioSampleFrequency / 1000));
// Determine the index, in the history of all samples, of the last sample
const int32_t start_sample_offset =
g_latest_audio_timestamp * (kAudioSampleFrequency / 1000);
// Determine the index of this sample in our ring buffer
const int capture_index = start_sample_offset % kAudioCaptureBufferSize;
// Read the data to the correct place in our buffer
I2S.read(g_audio_capture_buffer + capture_index, DEFAULT_PDM_BUFFER_SIZE);
// This is how we let the outside world know that new audio data has arrived.
g_latest_audio_timestamp = time_in_ms;
}
TfLiteStatus InitAudioRecording(tflite::ErrorReporter* error_reporter) {
// Hook up the callback that will be called with each sample
I2S.onReceive(CaptureSamples);
// Start listening for audio: MONO @ 16KHz with gain at 20
I2S.begin(I2S_PHILIPS_MODE, 11000, 20);
// Block until we have our first audio sample
while (!g_latest_audio_timestamp) {
}
return kTfLiteOk;
}
- command_responder.cc/ .h: Edit the code to account for the LEDs found onboard the XMC4700 Relax Kit development board.
constexpr int yes_led = 24;
constexpr int no_led = 25;
void RespondToCommand(tflite::ErrorReporter* error_reporter,
int32_t current_time, const char* found_command,
uint8_t score, bool is_new_command) {
static bool is_initialized = false;
if (!is_initialized) {
pinMode(yes_led, OUTPUT);
pinMode(no_led, OUTPUT);
digitalWrite(yes_led, LOW;
digitalWrite(no_led, LOW);
is_initialized = true;
}
static int32_t last_command_time = 0;
static int certainty = 220;
if (is_new_command) {
TF_LITE_REPORT_ERROR(error_reporter, "Heard %s (%d) @%dms", found_command,
score, current_time);
// If we hear a command, light up the appropriate LED
// Heard Yes
if (found_command[0] == 'y') {
last_command_time = current_time;
digitalWrite(yes_led, HIGH); // LED1 on for yes
digitalWrite(no_led, LOW); // LED2 off for no
}
// Heard No
if (found_command[0] == 'n') {
last_command_time = current_time;
digitalWrite(yes_led, LOW); // LED1 off for yes
digitalWrite(no_led, HIGH); // LED2 on for no
}
// Unknown what was heard
if (found_command[0] == 'u') {
last_command_time = current_time;
digitalWrite(yes_led, LOW); // LED1 on for yes
digitalWrite(no_led, LOW); // LED2 off for no
}
}
// If last_command_time is non-zero but was >3 seconds ago, zero it
// and switch off the LED.
if (last_command_time != 0) {
if (last_command_time < (current_time - 3000)) {
last_command_time = 0;
digitalWrite(yes_led, LOW); // LED1 on for yes
digitalWrite(no_led, LOW); // LED2 off for no
}
// If it is non-zero but <3 seconds ago, do nothing.
return;
}
}
Building the Project
The XMC4700 should be programmed and powered with the debugger micro-USB port that is located on the same side as the RJ45 ethernet jack and the microSD card reader. Begin by plugging the micro USB cable into the XMC4700 Relax Kit development board and your computer. Launch the Arduino IDE and ensure that XMC4700 is selected under Tools > Board > XMC Family.
Figure 9: In Windows, use Device Manager to determine the COM port assigned to your XMC4700 Relax Kit. (Source: MB Parks)
Next, ensure the proper COM port is selected under Tools > Port. If you are unaware of which COM port is associated with the XMC4700 Relax Kit development board, open Device Manager, and expand the Ports (COM & LPT) entry to reveal the list of available COM ports. There should be an entry for Jlink CDC UART Port(COMX), where the X is replaced by some integer value such as COM9 (Figure 9).
With the correct board and port selected, it is time to program XMC4700 by clicking on the Upload button. If successful, the output window of the IDE will show "…Finished Successfully." (Figure 10)
Figure 10: Output of a successful build in the Arduino IDE. (Source: MB Parks)
Project in Action
With the IM69D130 Microphone Shield2Go and the micro-USB cable board plugged into the XMC4700 Relax Kit development board, it is time to power up the development board. Note, the USB cable does not have to be plugged into a computer. It can also be plugged into a USB power adapter.
Wait a few seconds and then speak the word "yes." In response, onboard LED1 should illuminate for three seconds. Wait a few seconds and then speak the word "no." In response, onboard LED2 will illuminate for three seconds. If no words are heard or if a word is spoken that is unrecognizable; then the LEDs will remain unlit. Remember to speak slowly and clearly and take a few seconds of a pause between each word spoken to the device.
This project can be expanded to control various actuators. Imagine being able to open a house door while your hands are full of groceries by speaking a command. Or while working in the woodshop and turning on a vacuum automatically when a table saw is started. Any activity that produces unique audio signatures (voice or otherwise) is prime to be an input to this device. The outputs can be anything from relays to transistors, from solenoids to a heating coil. Audio commands can control anything that can be controlled by electronic signals.
All of this is possible without the need for an internet connection thanks to the powerful combination of efficient machine-learning algorithms and highly capable embedded electronics. The ability to process voice commands locally is a plus for responsiveness as well as security. Furthermore, Google provides a mechanism to train the model to accept additional words by leveraging the Python programming language and either a Jupyter notebook (to be run locally on your computer) or online using Google Colaboratory (Figure 11). If you are interested in looking to add additional words to the neural network, visit this tutorial.
Figure 11: Google provides online tools to train your model to learn new words. (Source: Google)