Chatbots: Audio, AI, and Machine Learning

February 7, 2018

Paul Golata

By Paul Golata for Mouser Electronics

Last Christmas, one of my daughters got a family game called Hearing Things™, which we all played with many laughs. It’s a variation of the “Whisper Challenge” that has gone viral: One person wears headphones that not only block hearing but also make random noises. Another person faces the person in the headphones and reads from a card a random phrase, such as “Pigeons like to cuddle.” Then, without other cues or clues, the player wearing the headphones must decipher the phrase by reading lips only.

Much like a computer, the player trying to detect the phrase looks for clues—though they’re likely not blatantly obvious. The player uses their vision to “hear” and decipher what the lips are saying. This is where the fun happens (of course!), because what is heard through visual cues can be very different and random in comparison to what was truly said. Such a game takes advantage of the limitations humans have in aligning our auditory and visual senses.

Could a computer do better? Today’s chatbots are making great headway. These applications use audio technologies, artificial intelligence (AI), and machine learning that—combined—bring about human-like reasoning and response in conversation. In particular, advances in natural language programming and neural networks have converged to create dynamic human-machine interaction with significant potential benefits for companies and end-users.

Technologies Converge

Although advances in AI, machine learning, and audio technologies have been progressing for some time now, only recently, have they converged to make human-like, human-machine interaction possible.

Artificial Intelligence and Natural Language Programming

AI is helping to change the value of computing from that of automating and scaling processes to that of knowledge via actionable insights. By interpreting a program and a set of processes, AI is allowing companies to have new insights into their markets, generate new value, and deploy rapidly powered decisions.

In the domain of audio, AI can assist humans in all sorts of ways. This is because much of what we employ in human language and communication can now be programmed into machines, which can computationally handle intricate pattern recognition schemes by employing natural language processing (NLP) algorithms. Presently, designers are working to enable machines to use natural human language in a bidirectional manner. For example:

Machine hears human language, Machine understands human language
Machine understands human language, Machine responds in human language

One challenge in programming bidirectional communication is ensuring that the algorithms are formatted and arranged to understand the surrounding environment and respond appropriately. This is called the “Frame Problem”—the process of humans ensuring that the computer has the instructions it needs to achieve a certain function. To respond appropriately, machines must be programmed to understand both the explicit language and proper clues regarding the intentions and attitudes behind expressions.

Common programming languages used in the field of AI include Python, Java, Lisp, Prolog, and C++. Python is a very popular computer program for AI applications. With its modular architecture, focusing on separate areas of specific functionality, it employs a relatively simple set of principles that provide the rules for the structure of meaning. Its frameworks, including NLTK, genism, and Quepy, are ideal for NLP and text processing. These are defined as follows:

NLTK is a collection of open source Python modules that stand for Natural Language Toolkit. It provides linguistic data and lexical resources to allow for development of NLP and text analytics in commonly employed operating systems.
Genism is a module designed to extract semantic meaning from documents, without unwarranted complications.
Quepy works to transform natural language questions into queries in a database query language.

Machine Learning and Neural Networks

For humans, learning is a natural part of our innate intelligence, biological maturing, and experience. Learning may be understood as recursive self-improvement, where feedback is used to adjust for better outcomes. Through a study of the human brain—in particular, neural networking—researchers have conceptualized powerful machine learning and transitioned it into AI. Neural networks consist of electronic interconnections that can influence the collective behavior of many neurons. What this means is that neural networks impact a broader context, rather than solely a single neuron.

Coupled with deep reinforcement learning (or deep learning), whereby goal-oriented performance is positively or negatively reinforced through feedback, these recent breakthroughs are enabling creative, new strategies for tackling a multitude of complex problems, including understanding and responding to natural human language and conversation. Products such as Seeed Studio’s Google AIY Voice Kit and Seeed Studio’s ReSpeaker 2-Mics Pi HAT are a fun and fast way to begin working with a natural language recognizer.

The Rise of Chatbots

Chatbots combine audio technologies, AI, and machine learning by receiving sensor input—using algorithms to determine actionable insights, responding based on those insights, and then recursively learning from subsequent input. Chatbots are one-way, forward-thinking communicators used by companies—including Amazon, Apple, Facebook, and Google—who are working to engage with their customers. Chatbots are continually improving through machine learning and predictive analytics. They sense, think, decide, and then act. Chatbot applications overlap with AI, machine learning, and deep learning ( Figure 1 ).

Figure 1: Chatbots overlap with AI, machine learning, and deep learning.

Companies are wise in using chatbots to increase their revenue and provide higher levels of customer service and customer engagement. Today’s customers are very technologically savvy and expect companies to respond quickly and efficiently to their needs. Talking chatbots such as Siri, Alexa, and Cortana are now household names to most of us. While still working to emulate and exceed human behavior and performance, designers are working on the chatbot’s ability to increase its intelligence through dynamic interaction, making it more human-friendly and inviting. The goal is to make computers that speak with human-like capabilities in the areas of:

Text-to-speech and speech-to-text
Tone analysis and personality insights
Natural language classifier and language translation
Virtual agent and conversation programming

Deep learning AI, such as that employed by Google’s DeepMind, are enabling computers to no longer sound like computers. Instead, computers can realistically converse with a synthesized voice, thus passing the Turing test—meaning its response is so realistic to a human that a human cannot tell they are conversing with a computer. Deep learning achieves these results by processing volumes of textual conversations and learning human language and communication aspects from these conversations. Because the AI can computationally process so many textual conversations, it has a sufficiently large experience framework from which to draw conclusions and, then, make appropriate human language responses.

Similar to the way conversational AI learns how to communicate, it can also be made to respond with a unique human voice and human-like emotions. Beyond merely storing voices into memory, as artificial neural networks process thousands of hours of recorded human voices, the networks can extrapolate specific details to emulate natural human speech. This enables a chatbot to employ AI to select an appropriate voice and manner suitable for the occasion. It only takes a few moments of recorded voice data for the AI to be able to replicate the voice, regardless of the speech pattern it must converse in.

Chatbots can use conversational AI to provide a more personalized service. The chatbot’s ability to employ natural human language-like conversation as well as to ask for and receive high-quality information and feedback is expected to lead to more significant sales opportunities and customer satisfaction. Chatbots are programmed to always deliver the highest level of customer service.

Chatbots are also media agnostic: They do not care if you are talking to them on the telephone, by email, or through social media applications. In each instance, it is programmed to respond appropriately. Chatbots are a digital-based, customer-interfacing option, taking advantage of the technology and digital world that has transformed much of our business environment. Chatbot technology is appropriate in certain business aspects, including sales, marketing, customer service, and other similar roles. In instances where chatbots are already using digital technology to interface with a company, they are compelling options for companies to interface with their customers as well.

One of the most significant issues facing chatbots is helping them to be prepared to address specific user needs. Because humans are so complex, it is desirable that the chatbots understand what the user is requesting, even if the context is dynamic and fluid. This requires understanding the subtle nuances and distinctions of human language so that mistakes are avoided.

Combined with predictive analytics, excellent-performing AI chatbots will seem near-able to read the minds of those they’re interacting with by anticipating where the conversation is headed. This means chatbots will evolve so they become more focused on making suggestions and predictions—making them ideal candidates to receive an increased ability to take specific actions. Building increasingly intelligent chatbots is an ongoing challenge, as designers work to make them contextually aware of situationally responsiveness in positive alignment with human interactions and needs.

Conclusion: The Conversation Emerges

Advances in AI, machine learning, and audio technologies have converged to make human-like, human-machine interaction possible through chatbots. With capabilities to recognize and interpret speech and tone, chatbots are becoming virtual agents in providing basic customer service and similar interactions—with capabilities to interpret, respond to, and learn from speech input and its many subtleties and cues.

However, to respond appropriately, machines must be programmed to understand both the explicit language and proper clues regarding the intentions and attitudes behind expressions. This is where advances in natural language programming have advanced human-machine interaction and successful bidirectional communication between the two. Neural networks have also been a key advancement, by enabling machines to learn from previous interactions.

I am looking forward to the day when my computer helps me write my technical articles. Hopefully, the first words out of its mouth are not, “I think that’s immature, amateurish, and foolish. I suggest you write the following….” If only it had the ability to read my lips right now, perhaps people wouldn’t laugh at me—like when playing the game Hearing Things™.

Chatbots: Audio, AI, and Machine Learning

Technologies Converge

Artificial Intelligence and Natural Language Programming

Machine Learning and Neural Networks

The Rise of Chatbots

Conclusion: The Conversation Emerges

Stream section placeholder

UNO Q Marks Exciting New Chapter for Arduino

The New Skills Behind Smart Sensor Innovation

Demystifying the Model Context Protocol

Applying Business Goals to Machine Learning Metrics

AI + Edge = Quicker Real-Time Decision-Making

AI Is Transforming Airport Operations

Smart Edge ML with NXP FRDM-MCXN947

Edge Impulse Fundamentals: Part Six

Jump into Machine Learning with NXP

Edge Impulse Fundamentals: Part Seven

Edge Impulse Fundamentals: Part Five

Implementing TinyML: Introduction to Libraries, Platforms, and Workflows

Give AI Tools, Get Better Results

Edge Impulse Fundamentals: Part Four

Edge Impulse Fundamentals: Part Three

Edge Impulse Fundamentals: Part Two

Edge Impulse Fundamentals: Part One

An Engineer's Primer: Designing AI Systems

AI Can Code Now