Where does AI bias come from?

AI models learn patterns from data, including patterns that reflect historical biases, stereotypes, and inequities in that data.

Why do AI models exhibit bias?

Models learn from data. If that data contains patterns we'd rather not perpetuate, the model learns those too.

This isn't a bug that can be fixed with better code. It's a fundamental property of learning from human-generated data: the model becomes a reflection of that data, including its flaws.

What kinds of bias appear in AI models?

Several distinct types:

Representational bias: Some groups appear more often in training data than others. The model "knows more" about well-represented groups.

Stereotypical associations: If training data frequently associates certain professions with certain demographics, the model learns those associations. "CEO" might default to male; "nurse" might default to female.

Quality disparities: The model may perform differently for different groups. Autocomplete might work better for common names than unusual ones. Translation might be more accurate for well-resourced languages.

Amplification: Models can actually amplify biases present in data. If 60% of "doctor" examples in training data are male, the model might associate "doctor" with male 70% of the time.

Where does the biased data come from?

Training data for large language models is scraped from the internet, books, and other text sources. This data reflects:

  • Historical inequities: Literature from eras with explicit discrimination
  • Contemporary stereotypes: Online content isn't free of bias
  • Selection effects: What gets written, published, and preserved isn't a neutral sample of human thought
  • Overrepresentation: English-language, Western, and internet-user perspectives dominate

How do AI labs try to address bias?

Several approaches:

  1. Data filtering: Remove obviously offensive or biased content from training data. Imperfect; subtle bias remains.

  2. Balanced sampling: Intentionally include diverse perspectives. Limited by what data exists.

  3. RLHF: Train the model to avoid biased outputs through human feedback. Raters flag problematic responses.

  4. Constitutional AI: Define principles the model should follow, including fairness guidelines.

  5. Output filtering: Detect and block biased responses at inference time. A band-aid, not a cure.

None of these fully solves the problem. Bias mitigation is an ongoing process, not a one-time fix.

The measurement problem

Even defining "bias" is contested. Consider:

  • Is a model biased if it reflects true statistical patterns in the world? (More men are CEOs. Should the model know this?)
  • Is it biased if it ignores those patterns? (Should "CEO" be gender-neutral even though the role historically wasn't?)
  • How do we weigh different types of harm? (Stereotyping vs. erasing real disparities?)

Researchers develop benchmarks to measure bias, but the benchmarks encode assumptions. A model might "pass" one bias test while "failing" another that measures something slightly different.

What can you do as a user?

  • Be skeptical of defaults: If the model makes assumptions (about gender, race, profession, culture), question whether those assumptions are warranted.
  • Ask explicitly: Instead of letting the model assume demographics, specify what you want or ask it to consider multiple perspectives.
  • Notice patterns: If you see the same stereotyped outputs repeatedly, that's the model's training showing through.
  • Don't use AI for high-stakes decisions about people without human review: Hiring, lending, criminal justice, anywhere bias could cause real harm.

Sources & Further Reading

๐Ÿ”— Article
On the Dangers of Stochastic Parrots
Bender et al. ยท Wikipedia ยท 2021
๐Ÿ”— Article
Language Models and Bias
Google AI Blog ยท 2023
๐Ÿ“„ Paper