+ - 0:00:00
Notes for current slide
Notes for next slide

An Example to Remember

Jo Hardin

March 24, 2022

1 / 14

Bias in a model

talent ~ Normal (100, 15)
grades ~ Normal (talent, 15)
SAT ~ Normal (talent, 15)

College wants to admit students with

talent > 115

... but the college only has access to grades and SAT which are noisy estimates of talent.

The example is taken directly (and mostly verbatim) from a blog by Aaron Roth Algorithmic Unfairness Without Any Bias Baked In.

2 / 14

Plan for accepting students

  • Run a regression on a training dataset (talent is known for existing students)
  • Find a model which predicts talent based on grades and SAT
  • Choose students for whom predicted talent is above 115
3 / 14

Flaw in the plan ...

  • there are two populations of students, the Reds and Blues.

    • Reds are the majority population (99%)
    • Blues are a small minority population (1%)
  • the Reds and the Blues are no different when it comes to talent: they both have the same talent distribution, as described above.

  • there is no bias baked into the grading or the exams: both the Reds and the Blues also have exactly the same grade and exam score distributions

4 / 14

Flaw in the plan ...

  • there are two populations of students, the Reds and Blues.

    • Reds are the majority population (99%)
    • Blues are a small minority population (1%)
  • the Reds and the Blues are no different when it comes to talent: they both have the same talent distribution, as described above.

  • there is no bias baked into the grading or the exams: both the Reds and the Blues also have exactly the same grade and exam score distributions

But there is one difference: the Blues have more money than the Reds, so they each take the SAT twice, and report only the highest of the two scores to the college.

Taking the test twice results in a small but noticeable bump in the average SAT scores of the Blues, compared to the Reds.

5 / 14

Key insight:

The value of SAT means something different for the Reds versus the Blues

(They have different feature distributions.)

6 / 14

Let's see what happens ...

7 / 14

Two models:

Red model (SAT taken once):

## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 33.4 0.152 220. 0
## 2 SAT 0.332 0.00149 223. 0
## 3 grades 0.333 0.00150 223. 0

Blue model (SAT is max score of two):

## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 24.2 1.52 15.9 8.47e- 51
## 2 SAT 0.430 0.0154 27.9 6.53e-127
## 3 grades 0.291 0.0142 20.5 3.15e- 78
8 / 14

New data

  • Generate new data and use the two models above, separately.

  • How well do the models predict if a student has talent > 115?

9 / 14

New data

color tpr fpr error
blue 0.510 0.044 0.113
red 0.504 0.037 0.109

tpr=true positivesall who should

fpr=false positivesall who should not

10 / 14

TWO models doesn't seem right????

What if we fit only one model to the entire dataset?

After all, there are laws against using protected classes to make decisions (housing, jobs, credit, loans, college, etc.)

## # A tibble: 3 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 33.4 0.151 221. 0
## 2 SAT 0.332 0.00148 224. 0
## 3 grades 0.334 0.00149 224. 0

(The coefficients kinda look like the red model...)

11 / 14

How do the error rates change?

One model:

color tpr fpr error
blue 0.613 0.063 0.113
red 0.502 0.037 0.109

Two separate models:

color tpr fpr error
blue 0.510 0.044 0.113
red 0.504 0.037 0.109
12 / 14

What did we learn?

with two populations that have different feature distributions, learning a single classifier (that is prohibited from discriminating based on population) will fit the bigger of the two populations

  • depending on the nature of the distribution difference, it can be either to the benefit or the detriment of the minority population

  • no explicit human bias, either on the part of the algorithm designer or the data gathering process

  • the problem is exacerbated if we artificially force the algorithm to be group blind

  • well intentioned "fairness" regulations prohibiting decision makers form taking sensitive attributes into account can actually make things less fair and less accurate at the same time

13 / 14

Simulate?

  • different varying proportions
  • effect due to variability
  • effect due to SAT coefficient
  • different number of times the blues get to take the test
  • etc.
14 / 14

Bias in a model

talent ~ Normal (100, 15)
grades ~ Normal (talent, 15)
SAT ~ Normal (talent, 15)

College wants to admit students with

talent > 115

... but the college only has access to grades and SAT which are noisy estimates of talent.

The example is taken directly (and mostly verbatim) from a blog by Aaron Roth Algorithmic Unfairness Without Any Bias Baked In.

2 / 14
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow