Implementing AI in FPGAs
(Source: putilov_denis - stock.adobe.com)
Field Programmable Gate Arrays (FPGAs) are well known for accelerating artificial intelligence / machine learning applications, but how is this implemented in FPGAs and what are the different approaches? Let’s explore the engineers’ design space.
Implementing AI in FPGAs
Artificial intelligence (AI)is a hot topic in both cloud and edge applications. In many cases, AI enables safer, more efficient, and secure systems. Artificial intelligence has been around a long time it was first used in 1956 by John McCarthy when the first conference on artificial intelligence was held. While significant research has been performed across the decades, it is only in the last 5 – 10 years that AI systems have been moving from the lab and research and into product road maps and products.
Within the cloud and edge environments, one of the most deployed forms of AI is Machine Learning (ML). Machine learning is the study of computer algorithms that allow computer programs to automatically improve through experience. An example of this is providing a ML network with a dataset of images with labels. The machine learning algorithm identifies features and elements of the image so that when a new unlabelled unseen image is input, the ML algorithm determine how likely the image is to contain any of the learned features and elements. Such ML algorithms can be trained to detect objects in images, process keywords in speech, and analyze sensor data for anomalies. Typical applications include vision-guided robotics, autonomous operation of vehicles, and prognostics for industrial and safety critical systems.
ML learning algorithms are therefore split into two elements, the first being the training of the network against the training dataset. The second being the deployment in the field of the trained network, these elements are called training and inference respectively. Training accurate models requires a large, labelled dataset and is often performed on cloud-based GPUs to accelerate the training process. Design engineers can deploy the trained network across a range of technologies from MCU to GPU and FPGA. Several very popular frameworks Caffe, TensorFlow. and Pytorch aid training and deployment of the AI/ML systems. These frameworks are used for both network definition, training, and inference.
One of the key elements of many edge-based AI systems is the ability to perform inference within a determined timeframe. For example, autonomous vehicles must detect vehicles, obstacles, and pedestrians quickly to prevent collision. This requires a solution that is both responsive and deterministic, responsive because the sensor data must be processed quickly with minimum delay, deterministic as the response time for each input must be the same and not reliant upon system operating conditions or resource usage e.g., use of shared DDR memory slows down the response time.
Due to the requirements of responsivity and determinism, developers of edge-based solutions often target FPGA or heterogeneous SOC based solutions. These provide the developer with a programmable logic, ideal for implementing machine learning networks as its parallel nature enabled both a responsive application and a very deterministic solution.
When it comes to implementing Machine Learning inference in programmable logic, two approaches can be undertaken. Regardless of which approach is taken while neural networks are developed and trained using floating-point mathematics, implementations in FPGA or heterogeneous SoC typically use fixed-point implementations. The process of conversion from floating to fixed point is called quantization and can come with a small reduction in inference accuracy; however, for most applications, additional training can be performed using the quantized weights and activations to recover the accuracy.
The first approach implements the neural network directly within the programable logic. The trained weights for the inference at them loaded into the network. This can be achieved either at run time or a during compilation / synthesis of the design.
An example of these neural networks is the AMD-Xilinx FINN network, which can be used to implement quantized neural networks in FPGAs. These quantized neural networks are implemented as a quantized neural network with binary weights and two-bit activations.
The alternative approach to using a direct implementation of the neural network within the FPGA logic is the use of a highly specialized neural network accelerator. The neural network accelerator is implemented in the programmable logic and is closely coupled to the DDR memory with high bandwidth links, along with the dedicated processors within the heterogeneous SOC. In applications that use a neural network accelerator, they are provided the network and weights / activations and biases by the software application. As such, this makes the ML inference easier to integrate within the overall application. One example of a neural network accelerator is the AMD-Xilinx Deep Learning Unit, which can work with networks defined in Pytorch,, Caffe, and TensorFlow and perform all the quantization, retraining, and program generation for the application. This provides for easier integration into the application under development.
Whereas to be able to achieve a neural network in a FPGA using a quantized neural network the resources required are much lower as no external DDR or SoC based system support is required. The highest accuracy and performance comes with then use of a specialized neural network accelerator, and ease of integration often provides for a better solution overall. Hence this approach is taken by several vendors in their AI solutions.
Final Thoughts
The choice of which solution often depends as in many cases on the end application, as often while AI maybe a dominant marketing element of the solution. In the real-world AI is often only a tiny part of the overall solution as sensor interfacing, pre-processing, actuator drive and other elements that make up the solution will also come with their own constraints and requirements.
Programmable logic provides the developer with the ability to implement responsive and deterministic AI/ML solutions, for a range of applications, these solutions integrate with industry standard frameworks enabling the developer to focus upon the value-added activity.