Skip to main content

Creating Programs That Learn

(Source:  Who is Danny/Shutterstock.com)

Artificial intelligence (AI) lies at the heart of dramatic advances in automotive, healthcare, industrial systems and an expanding number of application areas. As interest continues to rise, the nature of AI has elicited some confusion and even fear about the growing role of AI in everyday life. In fact, the type of AI that enables an increasing number of smart products builds on straightforward but nontrivial engineering methods to deliver capabilities that are far removed from the civilization-ending AI of science fiction.

 

Definitions of AI range from its most advanced—and still conceptual—form where machines are human-like in behavior to a more familiar form where machines are trained to perform specific tasks. In its most advanced form, truly artificial intelligences would operate without the explicit direction and control of humans to arrive independently at some conclusion, or take some action just as a human might. At the more familiar, engineering-oriented end of the AI spectrum, machine-learning (ML) methods typically provide the computational foundation for current AI applications. These methods generate responses to input data with impressive speed and accuracy without using code explicitly written to provide those responses. While software developers write code to process data in conventional systems, ML developers use data to teach machine-learning algorithms like artificial neural network models how to generate desired responses to data.

 

How is a basic neural network model built?

 

One of the most familiar types of machine learning, neural network models pass data from their input layer through hidden layers to an output layer (Figure 1). As described below, the hidden layers are trained to perform a series of transformations that extract the features needed to distinguish between different classes of input data. These transformations culminate in values loaded into the output layer, where each output unit provides a value representing the probability that the input data belongs in a particular class. With this approach, developers can classify data such as images or sensor measurements using an appropriate neural network architecture.

Figure 1: Neural networks comprise layers of artificial neurons trained to distinguish between different classes of input data. (Source: adapted from Wikipedia)

 

Neural network architectures take many forms, ranging from the simple type of feedforward neural network shown in Figure 1 to deep neural networks (DNNs) built with a large number of hidden layers and individual layers containing hundreds of thousands of neurons. Nevertheless, different architectures typically build on an artificial neuron unit with multiple inputs and a single output (Figure 2).

Figure 2: An artificial neuron produces an output based on an activation function that operates on the sum of the neuron’s weighted inputs. (Source: Wikipedia)

 

In a feedforward neural network, a particular neuron nij in hidden layer j sums its i inputs, xi, adjusted by an input-specific weight wij, and adds a layer-specific bias factor bj (not shown in the figure) as follows:

Sj = åiwijxi+bj

Finally, the summed value Sj is converted to a single value output by a function, called an activation function. Depending on requirements, these functions can take many forms, such as a simple step function, arc tangent, or non-linear mapping like a rectified linear unit (ReLU), which outputs 0 for Sj<=0 or Sj for Sj>0.

 

Although they are all designed to extract the distinguishing features of data, different architectures might use significantly different types of transformations to do so. For example, convolutional neural networks (CNNs) used in image-recognition applications use kernel convolutions. Here, functions, called kernels, perform convolutions on the input image to transform it into a set of feature maps. Subsequent layers perform more convolutions or other functions, further extracting and transforming features until the CNN model generates a similar sort of classification probability output as in simpler neural networks.

 

For developers, however, the underlying math for popular neural network architectures is largely transparent thanks to the availability of machine-learning development tools [discussed elsewhere in this issue]. Using those tools, developers can fairly easily implement a neural network model and begin training it using a set of data called the training set. This training data set includes a representative set of data observations along with the correct classification for each observation—and represents one of the more challenging aspects of neural network model development.

 

How is a neural network model trained and deployed?

 

In the past, developers creating training sets had little option but to work through the many thousands of observations required in a typical training set, manually labeling each individual observation with its correct name. For example, to create a training set for a road sign recognition application, they would need to view images of road signs and label each image with the correct sign name. Public domain sets of prelabeled data let many ML researchers avoid this task and focus on algorithm development. For production ML applications, however, the labeling task can present a significant challenge. Advanced ML developers often use pretrained models in a process called transfer learning to help ease this problem.

 

Although emerging tools and services help facilitate data preparation, the characteristics of the training set nevertheless play a critical role in the effectiveness of the neural network model and overall application. The decisions made in choosing which observations to include and which to exclude have fundamental implications including flexibility, specificity and fairness that require careful consideration. As a result, the level of effort required to create an optimal training set can rival the effort required to implement the machine-learning program itself.

 

After the training set is created and the neural network model is implemented, the model training process iteratively runs the training data through the model. At each iteration, the training process calculates a loss function that measures the difference between the desired result provided by the data labels and the calculated result generated by the model. Using a method called back propagation, that error information is used by the training process to adjust the weights and other model parameters for the next iteration. This process continues until the loss function falls within some threshold or fails to improve after some specified number of iterations.

 

When training completes, the model is converted to an inference model by performing a number of optimizations, including removal of unneeded structures such as the back propagation mechanism, elimination of neurons that contribute little to the classification process, and even merger of layers. Programs implement the inference model by loading a compact representation saved in various standard formats by machine-learning tools and frameworks.

 

Neural networks are not always the best solution

 

Although neural networks might be the more recognizable type of machine learning, they are by no means the only or even best choice for some applications. Neural networks fall into a type of machine learning called supervised learning, because they rely on labeled data sets to train the algorithm. In sensor-based applications such as the Internet of Things (IoT) or industrial systems, other supervised learning algorithms like support vector machines (SVMs) or decision trees provide an alternative that is simpler, more compact and equally effective.

 

SVM methods classify data by finding where input data points lie within an n-dimensional space defined by the training data. Decision-tree methods use training data to construct a model that efficiently decomposes input data into a series of optimized decisions. As with the output layer of a neural network, the final leaf nodes of the decision tree provide the probability that the data falls into a particular class. This approach is particularly efficient for classifying sensor data such as simultaneous accelerometer and gyroscope measurements to detect occurrence of a complex movement. In fact, support for decision trees is integrated into some inertial measurement units from STMicroelectronics.

 

These supervised learning methods are perhaps the most recognizable form of machine learning, but other types of machine learning including unsupervised learning, reinforcement learning, and many others are already being applied to practical engineering problems. As the name suggests, unsupervised learning finds relationships within unlabeled data, using clustering techniques to identify data with similar characteristics. These techniques are particularly useful during training set development and for feature engineering, where developers optimize the selection of data characteristics, or features, to be used for training and inference. Reinforcement learning finds applications in robotics systems programming, using measures of utility to optimize training not unlike the use of loss functions in neural network training.

 

Conclusion

 

Based on sophisticated mathematical concepts, ML methods such as neural networks form the foundation of a growing array of smart products able to recognize and classify specific images or sensor measurements. For developers, implementing neural network models and other ML-based solutions in their applications follows a well-supported development process that is straightforward but by no means simple.

About the Author

Stephen Evanczuk has more than 20 years of experience writing for and about the electronics industry on a wide range of topics including hardware, software, systems, and applications including the IoT. He received his Ph.D. in neuroscience on neuronal networks and worked in the aerospace industry on massively distributed secure systems and algorithm acceleration methods. Currently, when he's not writing articles on technology and engineering, he's working on applications of deep learning to recognition and recommendation systems.

Profile Photo of Stephen Evanczuk