Chapter 9 of 13
Bias, Harm, and Fairness: When ‘Neutral’ Algorithms Make Moral Choices
Algorithms often inherit and amplify social biases, even when designers aim for neutrality. This module examines how issues of discrimination, fairness, and harm arise in AI systems that allocate resources, predict risk, or filter information.
From Neutral Code to Biased Outcomes
Why This Matters
Algorithms now help decide who gets loans, jobs, welfare benefits, medical care, or police attention. They may look neutral, but their decisions can be deeply moral.
Our Focus
We will examine AI systems that allocate resources, predict risk, and filter information, and see how they can create or amplify unfairness.
Key Concepts
You will learn about algorithmic bias, structural injustice, and formal fairness metrics, and how these link to ideas of justice and responsibility from ethics.
Where Algorithmic Bias Comes From
Biased Data
Historical data mirrors past discrimination. If a company mostly hired men, a model trained on its records may learn that men are "better" candidates.
Framing and Proxies
Predicting "who will be arrested" instead of "who will commit a crime" bakes in policing bias. Variables like ZIP code can act as hidden stand-ins for race or class.
Design and Deployment
Optimizing only for overall accuracy can harm minorities. Models are often reused in new contexts, and users may over-trust them as objective.
Link to Structural Injustice
Inequalities in housing, education, and policing shape both the data and the goals for the model, so injustice gets encoded into technical systems.
Example: Risk Prediction and Structural Injustice
The Setup
A court uses a risk tool trained on arrest records and demographics to predict "re-arrest within 2 years" and guide bail or sentencing.
Hidden Structure
Over-policed neighborhoods have more arrests, so the label "re-arrest" partly measures police presence, not just an individual’s behavior.
Model Behavior
The model learns that certain neighborhoods and prior arrests predict "risk", without understanding that this reflects unequal policing.
Resulting Harm
People from those communities are denied bail more often, reinforcing disadvantage and increasing future risk in a feedback loop.
Moral Choice in Disguise
Choosing to predict re-arrest using arrest data is already a moral choice: it treats the state’s perspective as ground truth and ignores structural injustice.
Harms from Classification, Prediction, and Personalization
Classification Harms
Classification assigns labels like "high risk" or "fraud". Misclassification, stigmatizing labels, and blocked opportunities are key harms.
Prediction Harms
Prediction estimates probabilities. If some groups get worse predictions, or if predictions are used mainly to punish, harm increases.
Personalization Harms
Personalization tailors content. It can create echo chambers, target vulnerable users with harmful ads, or hide good opportunities from some groups.
Context Matters
The same technical move can help or harm depending on who controls it, how it is used, and what social inequalities already exist.
Fairness Metrics: Demographic Parity vs Equalized Odds
What Are Fairness Metrics?
Fairness metrics are formal rules for judging if an algorithm treats groups fairly. Two key ones are demographic parity and equalized odds.
Demographic Parity
Demographic parity wants similar approval rates across groups. If 60% of applicants are approved overall, about 60% in each group should be approved.
Pros and Cons of Demographic Parity
It is intuitive and addresses representation, but can require approving more high-risk people in one group or rejecting low-risk people in another.
Equalized Odds
Equalized odds demands similar true positive and false positive rates across groups, focusing on whether error rates (and harms) are evenly shared.
Trade-offs
When base rates differ, you usually cannot satisfy demographic parity, equalized odds, and calibration together. Designers must choose which ideal to prioritize.
Thought Exercise: Choosing a Fairness Metric
Imagine you are designing a loan approval model for a bank.
Facts:
- Group A has historically had higher incomes and more approved loans.
- Group B has faced discrimination and has fewer past loans on record, but similar true repayment ability today.
You can tune your model toward one of these priorities:
- Maximize overall accuracy
- You ignore group labels and just minimize total error.
- Demographic parity
- You adjust thresholds so that both groups have similar approval rates.
- Equalized odds
- You adjust thresholds so that both groups have similar true positive and false positive rates.
Activity (no single right answer):
- Write down which option you would choose and why.
- For the option you chose, answer:
- Who benefits most? Who might be harmed?
- How would you justify this choice to:
- the bank’s shareholders,
- regulators concerned with discrimination,
- an advocacy group representing Group B?
- Connect to ethics:
- Which ethical theory from the previous module best supports your choice?
- Utilitarian (overall welfare),
- Kantian (respecting persons and rights),
- Rawlsian (maximizing the position of the worst off),
- Virtue ethics (what a just, fair person would do).
Use this to see how a "technical" choice of metric is also a moral and political decision.
Mini Coding Demo: Measuring Group Fairness
This simple Python example shows how you might compute basic group statistics to check for fairness. You do not need to understand every line; focus on the idea of measuring outcomes by group.
We assume:
- `y_true`: list of true labels (1 = should be approved, 0 = should not)
- `y_pred`: list of model predictions (1 = approved, 0 = rejected)
- `group`: list of group labels (e.g., "A" or "B")
Run this mentally or in a notebook to see how different metrics look.
Check Understanding: Bias and Fairness
Answer this question to test your understanding of how bias and fairness metrics interact.
A company builds a hiring algorithm that shows the same accuracy for men and women, but far fewer women receive job offers. Which fairness metric is most clearly *not* being satisfied?
- Equalized odds
- Demographic parity
- Calibration within groups
Show Answer
Answer: B) Demographic parity
Equalized odds focuses on equal error rates (true and false positive rates). The scenario says accuracy is the same, which suggests error rates may be similar. Calibration within groups means predicted probabilities match actual outcomes for each group. The key issue described is that far fewer women get offers, so the approval rates differ by group. That directly violates demographic parity.
Key Terms Review
Use these flashcards to review the main concepts from this module.
- Algorithmic bias
- Systematic errors in an algorithm’s outputs that disproportionately disadvantage certain individuals or groups, often reflecting existing social inequalities.
- Structural injustice
- Long-term, large-scale patterns of social organization (laws, institutions, norms) that systematically disadvantage some groups, even without individual ill will.
- Demographic parity
- A fairness criterion requiring that the rate of positive decisions (such as approvals) be similar across protected groups, regardless of underlying base rates.
- Equalized odds
- A fairness criterion requiring that true positive rates and false positive rates be similar across protected groups.
- Classification harm
- Harm that arises when people are placed into categories (e.g., high risk) that lead to stigma, misclassification, or loss of opportunities.
- Personalization harm
- Harm from tailored content or recommendations, such as echo chambers, exploitative targeting, or unequal exposure to opportunities.
- Responsibility gap
- A situation where it is unclear who is morally or legally responsible for an AI system’s decisions, especially when no human directly makes the final choice.
Connecting Fairness to Justice and Responsibility
To close, connect this module to the earlier ones on responsibility gaps and ethical theories.
Reflect briefly (write a few bullet points):
- Responsibility
- When an algorithm trained on biased data produces discriminatory outcomes, who is responsible?
- Data collectors?
- Model designers?
- The organization that deploys it?
- Regulators who set (or fail to set) rules?
- Justice perspectives
- Utilitarian: Which fairness metric or design choice seems to maximize overall well-being, and for whom?
- Kantian: Which choices best respect people as ends in themselves, not just data points or means to profit?
- Rawlsian: Which design best improves the situation of the worst-off group?
- Your stance
- Choose one real-world AI context (hiring, policing, healthcare triage, credit scoring, content recommendation).
- In 3–4 sentences, describe:
- A likely source of bias.
- The main kind of harm (classification, prediction, or personalization).
- Which fairness metric you would prioritize and why, using one ethical theory to justify your choice.
This exercise helps you see that "neutral" algorithms inevitably make moral choices, and that designers, deployers, and regulators share responsibility for those choices.
Key Terms
- equalized odds
- A group fairness criterion requiring that true positive rates and false positive rates are similar across protected groups.
- fairness metric
- A formal, quantitative definition used to evaluate whether an algorithm treats different groups fairly according to a specific criterion.
- prediction harm
- Harm caused by unequal or misused predictions, such as systematically less accurate risk scores for some groups or using risk estimates mainly to punish rather than support.
- algorithmic bias
- Systematic patterns in algorithmic outputs that unfairly disadvantage certain individuals or groups, often by reproducing existing social inequalities.
- demographic parity
- A group fairness criterion requiring that the probability of receiving a positive outcome (such as being hired or approved for a loan) is similar across protected groups.
- responsibility gap
- A situation in which it is unclear who is morally or legally responsible for the actions and impacts of an AI system.
- classification harm
- Harm arising from being placed into certain categories by an algorithm, leading to misclassification, stigma, or lost opportunities.
- personalization harm
- Harm resulting from tailored content or recommendations, including echo chambers, manipulative targeting, or unequal visibility of opportunities.
- structural injustice
- Persistent and widespread social arrangements that systematically disadvantage some groups, even without explicit discriminatory intent by individuals.