Apply Artificial Stupidity to Create Safe AI
(Source: Corona Borealis Studio/Shutterstock.com)
Introduction
Artificial stupidity (we’ll just call it “stupidity” here) refers to the practice of limiting various aspects of machine intelligence to match humans’ ability to perform a task. For example, we program chatbots to make mistakes in spelling and usage to mimic those that humans commonly make. Games use artificial stupidity as well, such as when a chess game has the intelligence to put you in checkmate but instead seizes a lowly pawn. A robot assistant that works with children might have its vocabulary limited to align with a certain age group. In some cases, stupidity aims to make intelligent machines more human-like, and in other cases, it’s needed for the use of the product itself—such as how futile and disheartening it would be to only play chess against a master-level opponent that puts you in checkmate every time.
Artificial Stupidity in AI
The same concept of adding stupidity into machine intelligence has the potential to help engineer Safe AI as well. Adding stupidity to AIs not only helps align machine intelligence with that of human capabilities, but can also help align it with human values and preferences. Human intelligence is limited by memory, and processing, and speed, as well as cognitive biases that comprise our underlying patterns of thinking. Although the human brain and computers have different architecture, examining humans’ cognitive limitations and cognitive biases points to ways we could limit artificial intelligence:
Potential Computing Limiters
- Long-term memory: The human brain can encode and store an estimated 5.1012 and 5.1014 bits of information. (That’s an estimated 100 billion neurons multiplied by an estimated 5,000 synapses.)
- Processing: The human brain can process 7 ± 2 concepts at a time.
- Visual processing:
- Attentional blink: Refers to humans’ two-stage visual processing in which all objects in view are perceived, but only selected ones are selected for in-depth processing. The brain cannot perceive a new object when presented between 200ms and 500ms after the first.
- Visual short-term memory (VSTM): Stores visual information for a few seconds so that it can be used as part of a cognitive task. VSTM has limited capacity but can generate schematic representations very quickly. VSTM capacity is measured by the visual information load and the number of objects.
- Psychological refractory period: Refers to the period of time when the response to a second stimulus is significantly slowed because the first stimulus is still being processed. Vision processing takes 100ms per image.
- Operations per second: Humans process an estimated 100MIPS (Millions of Instructions per Second).
- Clock speed (defined as the transmission of information in this use): Biological neurons operate at a peak speed of about 200Hz; microprocessors at ~2GHz.
Examples of Applying Computing Limiters
- Limit the overall computing power required for tasks to be completed.
- Limit maximum number of operations per second to that of humans.
- Limit RAM or working memory to correspond with human capabilities.
- Limit storage capacity to only what’s required.
- Devise advanced data structures and semantics to maximize storage capacity within what’s allotted.
- Limit access to knowledgebases to only those required.
- Align processing and memory with what’s needed—such as image recognition and edge detection do not require the resolution that humans have.
- Limit working memory by limiting the number of chunks to align with human capabilities.
- Limit memory retrieval speed to align with human capabilities—such as using linear processing.
- Use neural networks to emulate human visual and auditory processing and general connections.
- Limit software in its ability to become qualitatively superintelligence, such as upgrading its own hardware or making other physical changes.
- Limit self-improvement capabilities—e.g., by hardcoding directives or encrypting the needed source code.
Potential Cognitive Bias Limiters
Cognitive biases are deep-rooted patterns of thinking and tendencies based on knowledge, experiences, beliefs, education, and so forth that make up an individual’s subjective reality. These biases are part of an unspoken framework we use to make decisions and judgements, guide behavior, and the like. In AI systems, bias often refers to patterns of thinking that creep in unintentionally through the data or by algorithms finding patterns and using them to make decisions. Most discussions talk about removing bias from AIs to ensure that decisions are fair and … well … unbiased.
However, a more recent school of thought is using certain cognitive biases as a way to prevent AI behavior from doing harm. In other parts of this blog series, we talked about the value-alignment problem causing AIs to fail because at some point, sooner or later, AI behavior does not align with what a human would do. The danger here is that such consequences could be as simple as misfiling an incoming email or as significant as wiping out the human race. As an alternative to attempting to impart human values on AIs at the task or process level, the idea would be to use certain cognitive biases as overarching rules for behavior.
The following list gives you an idea of how applying biases could help in developing Safe AI:
- Planning fallacy: Prohibit the system from malicious behaviors.
- Bandwagon effect: Enable the system to acquire human values that are shared among users.
- Confirmation bias: Enable the system to rationalize and confirm that it is useful to help humans.
- Conservatism: Require the system to keep initial values as assigned.
- Courtesy bias: Keep the system from intentionally offending others.
- Functional fixedness: Ensure the system will use objects only as humans do.
- Information bias: Enable the tendency to seek information and think more.
- Mere exposure effect: Encourage good intentions toward humans by exposing systems to humans.
- Status quo bias: Ensure actions or inactions do not change the environment.
- Authority bias: Ensure systems will avoid extreme individuals/viewpoints.
- System justification: Ensure the system will defend human values and not promote revolution.
- Processing difficulty effect: Ensure the system prefers well-thought arguments and ideas over simple ones.
- Spotlight effect: Ensure the system overestimates the number of people observing its behavior.
Conclusion
Artificial stupidity limits various aspects of machine intelligence to align with humans’ ability to perform a task. This includes limiting various aspects of cognition, as well as applying cognitive biases that would help make AIs prioritize human life and well-being regardless of their particular tasks. Adding stupidity to AIs not only helps align machine intelligence with that of human capabilities, but can also help align it with human values and preferences.