Skip to main content

Artificial Intelligence and the Data that Drives It

(Source:  whiteMocca/Shutterstock.com)

Artificial intelligence (AI) is an ever-emerging tool that solves problems. It is growing in use to build businesses, help communities grow and be safe, and manage people and things in so many ways. The feedstock of AI, of course, is data. In fact, data is the centerpiece of AI. Data’s availability, integrity, sensitivity, and format play a role in the use of AI and its effectiveness.

Data and Artificial Intelligence

It’s important to have a brief understanding of different types of data and how they are used in AI.

Structured data, as the name implies, exists within a record in a fixed field; it has an ordered structure. For example, it may include numbers, currency amounts, names, dates, and addresses. It may be further characterized with other labels.

For example, a name can contain a prefix or a suffix. Currency may include cents or fractions of a value.  Other data labels allow it to be categorized, sorted, and ordered in a spreadsheet or matrix, and related from one data element to another. Additionally, it can be arranged in a hierarchy, much like biological species, the Dewey Decimal system, or as demographic data tagged to people. It may exist on paper, stored digitally, and observed and analyzed to build intelligence – artificial intelligence.
 

Unstructured data is different in that it is data that, generally, is not stored in a fixed field. It may originate from a text narrative, a medical prescription, a PowerPoint presentation, images, videos, and so on.

Machine Learning

Data, be it structured or unstructured, is at the core of artificial intelligence. AI uses scripts or algorithms to process data and "learn" it artificially - known as machine learning.
 

Machine learning is accomplished through data pattern observations, rather than from programmed instructions. However, it does employ programming to precipitate the learning process, observe patterns, and follow rules. Ultimately it leads to decisions based on iterations of data relationships. For example, facial recognition employs machine learning based on thousands of facial characteristics and facial images. 

Figure 1: Voice recognition uses machine learning to employ data pattern observations. (Source: metamorworks/Shutterstock.com)

 

Voice recognition (Figure 1) employs machine learning to do the same, using sounds, pronunciations, and voice inflexions. Deep learning is a sub-genre of machine learning, focused on data, one layer at a time, employing neural networks. Just as the brain has complex networks of neurons firing across synapses, neural networks are built on computing, modeled on the human brain and nervous system. In a sense, deep learning is where the data is fine-tuned to enable artificial intelligence to best mirror actual cognitive thinking and intelligence.

The Pathway of Data to Create Artificial Intelligence

AI uses data to think. Its data follows a pathway that begins with its collection and aggregation; it must be captured. Once captured it must be stored and organized from its rawest form, to a usable one where learning may begin. The learning, be it machine or deep learning, is automated using computing processes and language. Such learning produces artificial intelligence.

The culmination of the AI pathway is applying this intelligence to actions ordinarily completed by our cognitive abilities such as decision making, prediction, and other functions. Because computing works fast (around the clock, seven days a week) and can process huge volumes of data, it can accomplish what would otherwise be impossible for a human being to do. Consequently, we now find AI used in most every major industry and genre, from financial services, to medicine, healthcare, marketing, public safety and many other parts of our society and lives.

The Data Dilemma

Artificial intelligence relies on data—good data. Not all data is good. It may be unreliable, incomplete, sloppy, and inaccurate. Many things can compromise its integrity and usefulness.

McKinsey, in their research report "Catch Them if You Can: How Leaders in Data and Analytics Have Pulled Ahead," surveyed high performing data analytics leader to learn about data best practices. In the report, they state: "We know from experience that a robust data architecture allows organizations to support the rapid collection and sharing of data that enables frontline employees to access and utilize the data they need. It also helps to establish and maintain the high levels of data quality required to support effective data-based decision making. Our results bear out the important role data quality plays in driving analytics adoption: high-performing respondents report better data quality than their peers at other companies, and across respondents, low data quality was the factor most often cited as the biggest impediment to getting employees to use data consistently for decision making."

 

Additionally, privacy issues may pose a host of dilemmas. Healthcare fields may rely on unstructured data, such as a physician’s handwritten notes, only to violate prevailing privacy laws. Sensitive data is sometimes hard to discern and can be gathered for use, only to learn that to do so violates U.S. or international data privacy laws.
 

Yet another source of data dilemma is its availability. Some data is more easily retrieved than others. In fact, sometimes it is non-existent. What’s more, if data is available, it may be woefully incomplete, with missing fields and inconsistent values. For AI to work successfully, these data dilemmas must be reconciled and overcome.

With Enough Good Data, Remarkable Achievements Are Possible

There are many great examples of how data is used to feed AI. HealthMap, for instance, is a program that monitors infectious diseases by creating a digital map, with AI at its center, (Figure 2) to track the coronavirus spread. This map is posted daily.

Figure 2. Map of Coronavirus (Covid-19) cases as of May 2020. (Source: VK Studio/Shutterstock.com)

 

A programmed script extracts data from multiple sources and feeds it into a mapping program. It shows virus incidents by location and provides color characteristics as well as hard numbers of how many cases exist in different countries. HealthMap utilizes machine learning of infectious disease data to create visualizations and provide alerts when anomalies occur. HealthMap was used by Boston Children’s Hospital and caught some of the first signs of the COVID outbreak, before it became endemic.

 

Programmers are also collecting other data fields such as patient symptoms, age, gender and other demographic data that could possibly be used in conjunction with the model to provide even more information. Once a model is established, it can be tweaked, built upon, or changed. The more we know, the better it gets. The better it gets, the more we know and can learn.

Data is the Currency of AI

Data is the currency of artificial intelligence.  It is the feedstock for machines to learn from.  But aggregating data can be challenging as it exists in many different forms - some orderly, and other forms of data, not so orderly.  And its integrity and cleanliness will greatly influence its value as a commodity that feeds artificial intelligence.  However, when scientists, programmers, and developers, can successfully harvest and use data, many solutions via AI – never thought possible — are suddenly possible.

 As Mark Zuckerberg, the founder of Facebook, characterized it: "I think that AI is going to unlock a huge amount of positive things, whether that's helping to identify and cure diseases, to help cars drive more safely, to help keep our communities safe.”

 

 

About the Author

Jim Romeo is a journalist based in Virginia. He retired from a 30-year career in engineering and now writes about technology and business topics.