Understanding Language Through AI
Natural Language Processing and the GPT Algorithm
Image Source: Murrstock/Stock.adobe.com
By Alex Pluemer for Mouser Electronics
Published February 14, 2023
Introduction
Computers that understand human speech and can respond to verbal commands used to exist purely in science fiction (for example, HAL in 2001: A Space Odyssey). Attempts have been made since the 1930s to program machines to digest human language more easily, part of a field known as computational linguistics. Still, recent advances in artificial intelligence (AI) have brought that fiction closer to reality.
The different methods and evolving technologies employed in this endeavor fall under the heading of natural language processing (NLP), the field of computer science concerned with implementing AI to better understand human language. Whether you're asking Siri for directions or Alexa to turn up the thermostat, NLP is helping to facilitate that request. NLP includes both natural language understanding (NLU) and natural language generation (NLG), allowing digital assistants, chatbots, etc. to comprehend human speech and produce a response.
In the early 1980s, NLP took the form of handwritten instructions that were painstakingly programmed into a computer word by word, almost like teaching a language to a complete neophyte. Advancements in NLP were made possible by incorporating machine learning algorithms, which evolved beyond the strict “if/then” framework established by handwritten rules (now referred to as symbolic NLP), to develop a more flexible, probability-based structure for determining which word and/or syntax choices to make (known as statistical NLP). These algorithms "learned" the rules of grammar and syntax by analyzing documents and other written material on the internet and mimicking what they read.
Another big step in NLP’s evolution was the implementation of neural networks. Neural machine translation (NMT) takes even more of the preliminary programming and rules-based instruction out of the NLP process, allowing AI and deep-learning algorithms to teach the rules of human language to themselves. NLP is almost ubiquitous now, from speech-to-speech (like digital assistants) to text-to-text (like customer service chatbots) and even speech-to-text implementations (like transcription and translation programs).
This article will offer a deeper look at the operations and tools AI uses to process human language and how NLP can be implemented both now and in the future.
NLP Tasks and Operations
Now that we know what NLP is and what it does, the question becomes how does it do it? How do AI and machine learning algorithms teach themselves to understand human language? The first step is to determine a language's rules through a process called syntactic analysis, which examines when, where, and how sentences start and stop and how to punctuate them. For example, understanding that a period can end a clause but can also punctuate an abbreviation is crucial to grasping a sentence’s meaning. A similar operation is performed with words in a process called part-of-speech tagging (for example, determining whether a word is being used as a noun or a verb in context). NLP must differentiate between the noun form (“I’m going for a run”) and the verb form (“I can’t get this program to run correctly") of the same word to process its meaning. NLP also must distinguish between words with multiple meanings (known as word disambiguation), like in the case of the word "bank." A bank can be a place you invest your savings or the side of a river or the movement of a plane. AI has made significant improvements in this area of NLP, as it's far simpler to let machine learning tools absorb the different meanings of words by analyzing source material than programming them one by one. Distinguishing proper nouns from common words (sometimes referred to as named entity recognition) is another critical task. For example, differentiating between a member of the Chicago Bulls NBA team and a male cow.
Two words that refer to the same noun (as is often the case with pronouns) requires a process called co-reference resolution. For example, in the sentence “The press secretary wanted to respond, but he thought the better of it,” NLP would determine that the word "he" refers to the press secretary. The most challenging task NLP performs, and the area in which it still stands to make the most improvement, is sentiment analysis. Sentiment analysis refers to determining what attitudes and feelings are trying to be communicated and what the underlying subtext may be. This is particularly critical in social media and marketing applications, where it is vital to not just understand the words and phrases themselves but also the inclinations and convictions behind them. NLP still has a long way to go, but AI and machine learning have made considerable strides in recent years.
NLP Use Cases
NLP is implemented in many ways, but nearly everyone is familiar with its most common applications: virtual assistant technologies. As previously mentioned, digital voice-activated assistants utilize NLP to comprehend human requests and generate coherent and topical responses.
Chatbots and virtual customer service representatives are another nearly ubiquitous implementation of NLP in our everyday lives. AI-powered algorithms can learn from previous interactions with customers to improve their conversational capabilities in the future. While chatbots aren't yet great at finding new solutions to problems, they can remember previous solutions. A chatbot is more likely to know the answer to a customer's question if that question has been asked in the past.
In the last decade, NLP programs have made huge strides in text summarization applications, in which the programs read and process enormous amounts of data, usually in the form of documentation, and present clear and concise recaps. Text summarization applications are practical when the documents they're summarizing are excessively technical or jargoned. NLP programs can understand the more complex terminology and rephrase it in more straightforward language that's easier for a layperson to understand. Machine translation tools are similarly improving through machine learning and AI.
While NLP translation applications have long been able to translate individual words from one language into another, they are getting better at deciphering the meaning of more complex thoughts and ideas. For example, asking Google Translate to convert the title of Marcel Proust’s magnum opus À la recherche du temps perdu into English (an infamously tricky translation) elicits the response “In Search of Lost Time,” its commonly recognized English title, though not an exact translation.
NLP has also greatly improved efficiency in spam detection. Machine learning algorithms can detect patterns of speech and common phrases frequently found in mass advertising more effectively than older spam detection programs. Sentiment analysis is also a big factor in spam detection and social media harvesting, as determining the emotional responses certain advertisements or social media posts elicit is a large part of developing more targeted advertising campaigns designed for individual consumers. AI-powered NLP applications can process enormous amounts of data and synthesize it into digestible information, and this capability has more value in online and digital advertising than in any other area.
NLP Tools
NLP software programs have been developing since the 1950s. Still, the relatively recent introduction of statistical NLP and machine learning/neural networks has contributed to significant advancements in the quality of NLP tools. As previously mentioned, digital assistants like Siri and Alexa have advanced NLU and NLG capabilities, and applications like Google Translate dramatically improve understanding of context and intent. Options for working with NLP-related software have expanded as well. Programmers working with Python have various tools and libraries available in the Natural Language Tool Kit (NLTK), a compilation of open-source programs and libraries for working with NLP applications. Resources for primary NLP-related tasks, smaller operations (e.g., sentence parsing, stemming), and larger macro-level capabilities like semantic reasoning or sentiment analysis are free and accessible to all.
The hottest new NLP tool on the scene is ChatGPT from OpenAI, an AI research and development laboratory based in San Francisco. Powered by the third generation of OpenAI’s language prediction algorithm, GPT-3, ChatGPT’s NLG responses are sometimes indistinguishable from human speech, essentially meeting the criteria for Alan Turing’s eponymous test for artificial intelligence. While ChatGPT’s responses aren’t always factually sound (for example, when asked if Nigeria is in North America, it has responded “yes”), its grammar and syntax are almost always unassailable. ChatGPT is particularly effective at mimicking standard forms of documentation like rental agreements or simple contracts.
As a simple exercise, ChatGPT formulated the following response to the question, "What is natural language processing?":
Natural Language Processing (NLP) is a field of artificial intelligence and computational linguistics that focuses on the interaction between computers and human language, specifically in speech recognition, natural language understanding, and natural language generation. NLP algorithms and techniques are used to process and analyze large amounts of natural language data, such as speech and text, to perform tasks such as language translation, sentiment analysis, text summarization, named entity recognition, and many more. NLP is an interdisciplinary field, drawing on areas such as computer science, linguistics, and cognitive psychology, to develop algorithms and models that enable computers to understand, interpret, and generate human language.
It would be hard for a human being to summarize NLP any more clearly or concisely than an NLP-driven algorithm can already do itself.
Conclusion
NLP has the potential to revolutionize several professions and industries. Translating legal and technical documents from one language to another is a common and well-paid occupation, but translation tools could soon make it obsolete. Writing boilerplate legal documents or drawing up standard contracts is another task NLP might make humans unnecessary to complete. ChatGPT has already been prompted to create similar documents with great success. Customer service representation might be the most transformed field; call centers staffed with actual human beings may cease to be because AI speech-to-speech and text-to-text programs can answer customers' questions without taking breaks or going on vacations.
NLP has obvious potential in education, as well. For those learning a language, NLP programs can help them improve their writing and grammar by demonstrating how to better and more coherently construct sentences and paragraphs. Computers that can read, write, and communicate in human languages might seem like something out of science fiction, but they're already here. You may even have interacted with one today without even knowing it.