What AI Failures Tell Us About Safety Needs
(Source: Phonlamai Photo/Shutterstock.com)
Introduction
We don’t have to look far to find failures in artificial intelligence (AI): Spam filters misfiling important emails, GPS providing faulty directions, machine translation mangling sentence meaning, autocorrect replacing a correct word with a faulty one. The stock market “flash crash” of 2010 resulted in a 36-minute, 600-point, trillion-dollar stock market plunge. In 2016, Tesla’s Model S autonomous vehicle crashed into another vehicle because the autopilot system was locked on a vehicle that didn’t stop, rather than reacting in time to a new obstacle. The same year, Microsoft’s Tay(bot) was released on Twitter and subsequently began posting inflammatory tweets.
These and other examples demonstrate what’s referred to as the value-alignment problem, which refers to a misalignment between human values and preferences and the intended AI goals. In Part 2 of this six-part series, we’ll examine the value-alignment problem and its consequences, as well as the implications for creating Safe AI.
The AI Value-Alignment Problem
In evaluating an AI’s success or failure, it’s not enough that an AI achieves its intended goal; the process and outcome must also align with (individual and aggregated) human values and preferences. In the spam filter, Tesla, and Tay(bot) examples, these AIs did what they were programmed to do but failed because the process or outcome didn’t align with human values or preferences. The spam filter failed because it filed the email differently than you preferred. The autonomous vehicle crashed because it didn’t react to a new obstacle the way a human driver would have. Tay(bot) got in trouble because it replied to postings without regard for common human decency or online etiquette. All three did what they were programmed to do, but failed in that they didn’t respond as humans would want them to.
The 2010 “flash crash” market plunge reveals the potential ripple effect of value misalignment: British stock trader Navinder Singh Sarao created an algorithm to place large numbers of fake sell orders to artificially drive down stock value, which would cause others to quickly sell off stock in response. Over many instances, he would buy stock when values plummeted and then turn off the algorithm, which caused prices to go back up. The stock market crashed because of the automated reaction of other algorithms to such price manipulation.
A related concept here is that even well-intended AI has a dual potential to be beneficial and harmful, much like human intelligence. Examples from cybersecurity abound, in that computer code, encryption, big data, algorithms have the potential to do harm. Whether AI is purposefully malevolent, is accidentally misused, or causes unintended consequences, intelligent systems can cause as much or even more harm than their potential benevolent uses can do good. The risks associated with dual potential will only become larger as systems become capable of general intelligence.
Value-(mis)Alignment Consequences
The greater the divide between human values and AI goals, the greater the risk for significant, irreparable consequences. By way of example, suppose we want to eradicate cancer. A human might achieve this goal by predicting cancers years before they appear and creating targeted therapies for existing patients that cure the disease without harsh treatments. Conversely, an AI could cure cancer by simply eliminating humans. Indeed, the cancer would be eradicated—the AI accomplished the goal—but that route does not align with human values and priorities.
Without possessing human-like common sense, values, and preferences, machines end up with an alien-like intelligence that can be smarter than us, but really doesn't care about us. This is dangerous because, intentionally or unintentionally, humans could end up in the path of an AI achieving its goal. It’s like humans building a skyscraper: Although most of us value animals and their environments, we very often destroy bugs and critters and their habitats in the process of erecting the building. It might not be intentional, and some might deem the consequence insignificant. However, the equation changes if intelligent machines don’t care deeply about preserving human values and life as their primary goal.
Engineering Challenges
The value-alignment problem seems to indicate that we need to teach machines human values and preferences. But would this be possible? Just capturing an individual’s values and preferences poses many challenges. Oftentimes, there is a big difference between what we say we want and what we actually want. In some cases, we might not fully understand what we want. In other cases, our values are contradictory, as might be the case in what we say publicly versus what we say privately based on knowledge, motives, or expectations. By way of example, if we ask a student why she’s attending school, she’d probably tell you that she wants to learn, study an area of interest, or become a design engineer. Studies show, though, that the real motives lie elsewhere: Gaining credentials, earning peer approval, increasing earning potential, and all sorts of other reasons.
These challenges of capturing human values and individual preferences grow exponentially when considering billions of people, and many cultures, countries, norms, expectations, needs, and experiences. And even if we could take an accurate snapshot of human values, they’ll change over time. Today’s values are much different from those 100 years ago and differ from what they’ll be 100 years from now. We can’t foresee all instances, nor can we capture how values impact every decision in every context. There are too many variables.
Conclusion
So, it’s not enough that an AI achieves its intended goal; the process and outcome must also align with human values and preferences. The greater the divide between human values and AI goals, the greater the risk for significant, irreparable consequences. The value-alignment problem indicates that we need to teach machines human values and preferences, but that’s likely not possible given many challenges in identifying and accurately capturing these aspects. This is why AI Safety is so important: It targets the value-alignment problem not at the task level, but instead by ensuring machines prioritize human welfare, cooperative behavior, and service to humans as precedents for all other behavior.