An Example to Remember

An Example to RememberJo HardinMarch 24, 20221 / 14

Bias in a model

talent ~ Normal (100, 15)
grades ~ Normal (talent, 15)
SAT ~ Normal (talent, 15)

College wants to admit students with

talent > 115

... but the college only has access to grades and SAT which are noisy estimates of talent.

The example is taken directly (and mostly verbatim) from a blog by Aaron Roth Algorithmic Unfairness Without Any Bias Baked In.

2 / 14

Plan for accepting studentsRun a regression on a training dataset (talent is known for existing students)
Find a model which predicts talent based on grades and SAT
Choose students for whom predicted talent is above 115
3 / 14

Flaw in the plan ...

there are two populations of students, the Reds and Blues.
- Reds are the majority population (99%)
- Blues are a small minority population (1%)
the Reds and the Blues are no different when it comes to talent: they both have the same talent distribution, as described above.
there is no bias baked into the grading or the exams: both the Reds and the Blues also have exactly the same grade and exam score distributions

4 / 14

Flaw in the plan ...

there are two populations of students, the Reds and Blues.
- Reds are the majority population (99%)
- Blues are a small minority population (1%)
the Reds and the Blues are no different when it comes to talent: they both have the same talent distribution, as described above.
there is no bias baked into the grading or the exams: both the Reds and the Blues also have exactly the same grade and exam score distributions

But there is one difference: the Blues have more money than the Reds, so they each take the SAT twice, and report only the highest of the two scores to the college.

Taking the test twice results in a small but noticeable bump in the average SAT scores of the Blues, compared to the Reds.

5 / 14

Key insight:

The value of SAT means something different for the Reds versus the Blues

(They have different feature distributions.)

6 / 14

Let's see what happens ...

7 / 14

Two models:

Red model (SAT taken once):

## # A tibble: 3 × 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)   33.4     0.152        220.       0
## 2 SAT            0.332   0.00149      223.       0
## 3 grades         0.333   0.00150      223.       0

Blue model (SAT is max score of two):

## # A tibble: 3 × 5
##   term        estimate std.error statistic   p.value
##   <chr>          <dbl>     <dbl>     <dbl>     <dbl>
## 1 (Intercept)   24.2      1.52        15.9 8.47e- 51
## 2 SAT            0.430    0.0154      27.9 6.53e-127
## 3 grades         0.291    0.0142      20.5 3.15e- 78

8 / 14

New data

Generate new data and use the two models above, separately.
How well do the models predict if a student has talent > 115?

9 / 14

New data

color	tpr	fpr	error
blue	0.510	0.044	0.113
red	0.504	0.037	0.109

$tpr = \frac{true positives}{all who should}$

$fpr = \frac{false positives}{all who should not}$

10 / 14

TWO models doesn't seem right????

What if we fit only one model to the entire dataset?

After all, there are laws against using protected classes to make decisions (housing, jobs, credit, loans, college, etc.)

## # A tibble: 3 × 5
##   term        estimate std.error statistic p.value
##   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
## 1 (Intercept)   33.4     0.151        221.       0
## 2 SAT            0.332   0.00148      224.       0
## 3 grades         0.334   0.00149      224.       0

(The coefficients kinda look like the red model...)

11 / 14

How do the error rates change?

One model:

color	tpr	fpr	error
blue	0.613	0.063	0.113
red	0.502	0.037	0.109

Two separate models:

color	tpr	fpr	error
blue	0.510	0.044	0.113
red	0.504	0.037	0.109

12 / 14

What did we learn?

with two populations that have different feature distributions, learning a single classifier (that is prohibited from discriminating based on population) will fit the bigger of the two populations

depending on the nature of the distribution difference, it can be either to the benefit or the detriment of the minority population
no explicit human bias, either on the part of the algorithm designer or the data gathering process
the problem is exacerbated if we artificially force the algorithm to be group blind
well intentioned "fairness" regulations prohibiting decision makers form taking sensitive attributes into account can actually make things less fair and less accurate at the same time

13 / 14

Simulate?different varying proportions
effect due to variability
effect due to SAT coefficient
different number of times the blues get to take the test
etc.
14 / 14

Bias in a model

talent ~ Normal (100, 15) grades ~ Normal (talent, 15) SAT ~ Normal (talent, 15)

College wants to admit students with

talent > 115

... but the college only has access to grades and SAT which are noisy estimates of talent.

The example is taken directly (and mostly verbatim) from a blog by Aaron Roth Algorithmic Unfairness Without Any Bias Baked In.

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

An Example to Remember

Jo Hardin

March 24, 2022

Bias in a model

Plan for accepting students

Flaw in the plan ...

Flaw in the plan ...

Key insight:

Let's see what happens ...

Two models:

New data

New data

TWO models doesn't seem right????

How do the error rates change?

What did we learn?

Simulate?

Bias in a model

Help