Silkenweb Example: Hackernews Clone

Machine Bias

97 points by r0h1n 9 years ago | 57 comments

kough 9 years ago
This is totally fucked. Morally wrong, deeply unethical, and probably illegal – if you're adding punishment without having that additional punishment based on new evidence, isn't that like being treated guilty without proof? Obviously I'm not a lawyer, but how could anyone, let alone the whole huge set of people that led to these policies, think that applying group statistics to individuals to determine the severity of their punishment is ok?
On the other hand, these biaces (most notably the racial ones) exist in the process anyway, and now they're simply being codified and exposed. If these algorithms were published we could see exactly how much more punishment you get for being black in America versus being white.
Thanks again to ProPublica for an important piece of reporting; hopefully changes get made for the better.
- mikeash 9 years ago
  Punishment is always considered somewhat separately from the determination of guilt. The judge would already try to account for things like this when determining your sentence. They just do it in a deeply ad hoc and personal manner, where they just take a stab at it, try to account for things like how sorry you seem to be, apply guidelines, and come up with a number. This means that you might ultimately be punished for the judge not having a good breakfast:
  http://www.scientificamerican.com/article/lunchtime-leniency...
  And of course it goes without saying that judges will be affected by their biases, racial and otherwise.
  I'm not sure what to do about it, though. Handing down the exact same punishment for every single person who commits a particular crime seems too blind. But any variation is going to be problematic.
  - pessimizer 9 years ago
    All this does is systematize those biases so that they can't be challenged like a judge with a record of bias can. The statistics that they choose to record create bias in and of themselves - by using race in the algorithm, you are building in the possibility that race influences criminality. If you built in favorite foods, some foods would end up resulting in higher sentences than others, just as if you built in phases of the moon when the crime was committed or the astrological sign of the victim.
    Where there was absolutely no effect, one out of every twenty combinations of all other variables would show significance in combination with the current value of that particular variable in the likelihood of future crime.
    Furthermore, the algorithm would simply extend existing biases in arrest and sentencing, because it simply can't account for crimes that are uncaught and unpunished. Groups that are stopped, searched, arrested, and convicted at greater rates would without fail be sentenced to more time. Just another benefit of being white in America.
    You end up using the fact that some groups are punished more often to justify punishing them more harshly.
    Even worse, I bet that the fact that it thinks that women are at a higher risk for recidivism means that somewhere within the algorithm it's using the fact that women in general are less criminal than men to decide that women who do commit crime are more exceptional (within women), and therefore more deviant. It's disgusting. If you can't legally discriminate against a person on particular grounds, you certainly can't feed those grounds into an algorithm to let it discriminate for you while you shrug and feign innocence.
    The algorithm is the innocent one - it's just attempting to reflect the system as it is. It's like an algorithm you would write to predict the winners of horse races, or the sports book. And just like one of those algorithms, if you stuff it with garbage (the kind of garbage that makes it wrong 77% of the time), it will result in garbage. If you use the results for something not external to the system, bad variables will feed back into themselves and make the results progressively worse - what's the effect of a longer sentence on recidivism? How does profitable is the arbitrage on your sports book algorithm if people use the results to bet, and the distribution of bets shift the odds?
    - argonaut 9 years ago
      But they can be challenged. That's why you're reading an article about it. If you have a judge that is biased, it is probably harder to challenge his sentences than if you had an algorithm that you proved was biased.
- daveguy 9 years ago
  That does sound like a good argument against it -- adding punishment without evidence... Could they argue that they're reducing sentences for those less likely to repeat? If they don't see "evidence" that the person will repeat then they give a reduced sentence (kindof like early parole). Still unethical crap because it pushes a race-based agenda (consciously or unconsciously). I'd say there's no difference, and would agreewith your argument. Also, not a lawyer. Technically they don't ask "are you black". They ask whether or not you had a parent incarcerated -- good for propagating a broken status quo. That question almost seems designed to "increase punishment without evidence". Regardless there shouldn't be any private algorithm deciding this and any public algorithm should be well scrutinized and validated for accuracy.
  One thing is certain -- the federal government needs to shut these sentencing analysis companies down. At the very least heavy public audits. I'd say even libertarians would agree this is the definition of something that should be regulated.
Malarkey73 9 years ago
One of the most mind boggling sentences in that article was:
"On Sunday, Northpointe gave ProPublica the basics of its future-crime formula — which includes factors such as education levels, and whether a defendant has a job. It did not share the specific calculations, which it said are proprietary."
How on earth can you lock people up based on secret information? That is Kafka meets Minority Report.
- yummyfajitas 9 years ago
  This is done regularly. It's called "judicial discretion" - a judge uses a neural network so secret that even he doesn't understand it (in fact the entire scientific field of "neuroscience" exists to try and analyze it).
  Variables used in the formula include details of the case, race/appearance of the defendant, and how recently lunch was at the time of sentencing. Unlike the ProPublica claims of racial bias (which are merely "almost statistically significant" at the p=0.05 level), the lunch bias is statistically significant at the p < 0.01 level.
  http://www.pnas.org/content/108/17/6889.full
  This system sounds like a huge improvement.
  - mattkrause 9 years ago
    Just FYI: The lunch paper has very serious problems as described in this reply, also published in PNAS: http://www.pnas.org/content/108/42/E833.full)
    In particular, the cases are heard in a particular order. For each prison, the prisoners with counsel go before those who are representing themselves. As in the US, those representing themselves typically fair worse. The judges try to finish an entire prison's worth of hearings before a meal, so the least-likely-to-succeed cases are typically assigned to spots right before a break.
    There are some other bits of weirdness in the original data too. They found a statistically significant association between the ordinal position (e.g., 1st, 2nd, ..., last) and the parole board's decision, but failed to find any effect of actual time elapsed (e.g., in minutes), even though the latter is much more compatible with a physiological hypothesis like running out of glucose.
    - yummyfajitas 9 years ago
      Interesting, I was unaware. I need to associate more uncertainty to my beliefs about how terrible humans are at making decisions.
  - kenjackson 9 years ago
    As you note, the algorithm for judicial discretion is unknown. The algorithm for this software is fully known, just kept from the public.
    - yummyfajitas 9 years ago
      The validity of the algorithm can be - and apparently has been - reliably tested and been found to be useful and mostly unbiased. This analysis has been performed by both the algorithm's creators and highly adversarial third parties, such as the author of this article. Both found that whatever bias there is is small, and cannot be distinguished from random chance.
      For example, the author of this very article has done such an analysis. Here's her R notebook:
      https://github.com/propublica/compas-analysis/blob/master/Co...
      Her analysis shows (within the limitations of the frequentist paradigm) that:
      a) the predictor is useful - score_factorHigh and score_factorMedium both have p-values that are essentially zero.
      b) The predictor is not racially biased that much - race_factorAfrican-American:score_factorHigh and the other bias terms have p-values that are > 0.05 .
      Look, I'd love it if we required such algorithms to be open source. I'm a huge proponent of both open science and open government. Nevertheless, there is an entire discipline devoted to evaluating predictive algorithms without needing to care about their details - it's called "machine learning".
      The wonderful thing about statistics is that even a highly biased person (such as the author of this article) can still reach a correct conclusion that goes against their biases.
- cwilkes 9 years ago
  What if it came out of a neural net or some other system that can't be easily explained? There's no real "specific calculation" to show.
  Now if they were using decision trees, i.e. If the person has 3 or more felonies they get a 5 rating, that could be presented.
  I'm curious about how much of a feedback loop this process has. The model was probably trained on old data and never updated. Also how does it take into account features that it doesn't know about (the article mentions one guy turning to Christianity)? I doubt if there is a mechanism for people to be asked why they did or did not reoffend. Even if they did how much should it be trusted?
  - SixSigma 9 years ago
    Neural nets may be opaque but they are not secret.
  - Malarkey73 9 years ago
    I have this very concern about using SVM in medical research.
    I also worry greatly about diagnostic predictive models that maximise overall prediction success but don't balance the relative consequences of false positives and false negatives.
- carapace 9 years ago
  I'm leaning towards a Constitutional Amendment against automated law. The wording escapes me (and I'm unqualified anyhow) but the gist would be that only humans can judge humans, no machinery can be allowed to do it.
  https://en.wikipedia.org/wiki/Butlerian_Jihad
pdkl95 9 years ago
Weapons of Math Destruction
http://boingboing.net/2016/01/06/weapons-of-math-destruction...
It's easy to hide agenda behind an algorithm; especially when the details of the algorithm are not publicly visible.
- yummyfajitas 9 years ago
  It's far easier to hide an agenda behind verbiage and anecdotes. Go read the author's actual statistical analysis:
  https://github.com/propublica/compas-analysis/blob/master/Co...
  In the statistical analysis (unlike the verbiage) she is completely unable to hide the lack of bias and the accuracy of the algorithm, all of which are clearly on display in line [36]. In contrast, her verbiage somehow conveys the exact opposite impression.
  - zyxley 9 years ago
    Uh... it's all right there in your link, across several sections that analyze specific parts of the data.
    > Black defendants are 45% more likely than white defendants to receive a higher score correcting for the seriousness of their crime, previous arrests, and future criminal behavior.
    > Women are 19.4% more likely than men to get a higher score.
    > Most surprisingly, people under 25 are 2.5 times as likely to get a higher score as middle aged defendants.
    > The violent score overpredicts recidivism for black defendants by 77.3% compared to white defendants.
    > Defendands under 25 are 7.4 times as likely to get a higher score as middle aged defendants.
    > [U]nder COMPAS black defendants are 91% more likely to get a higher score and not go on to commit more crimes than white defendants after two year.
    > COMPAS scores misclassify white reoffenders as low risk at 70.4% more often than black reoffenders.
    > Black defendants are twice as likely to be false positives for a Higher violent score than white defendants.
    > White defendants are 63% more likely to get a lower score and commit another crime than Black defendants.
    Calling out one specific section that doesn't show bias doesn't magically exonerate the rest.
    - yummyfajitas 9 years ago
      None of these things are evidence of bias.
      The algorithm is biased if it's giving the wrong score due to race or redundantly encoded race. To show that the algorithm is biased, you need to show that (score, race) pairs are more predictive than (score, ) singletons.
      Line [36] and [46] both attempt to address this question. The only one of these which is statistically significant is "race_factorOther:score_factorHigh" in line [46].
      The other things you bring up are interesting, but do not show bias. At best they show disparate impact which isn't remotely the same thing.
  - pc86 9 years ago
    The data analysis you link to is by Jeff Larson, while the primary author of the article is Julia Angwin.
    Larson is still the second author so it is certainly a big question how he can present data showing no statistical correlation between race and score then have his name on an article saying the exact opposite that is clearly pushing an agenda. And as noted, one where the owners of the publication are also involved in a competing risk assessment product.
    - yummyfajitas 9 years ago
      It's not quite right that he shows no correlation between race and score. There is a strong correlation between race and score. This correlation is caused by the fact that blacks have a high recidivism rate (p = 4.52e-6).
      What the analysis shows is that once you know the predicted score of the algorithm, using race doesn't give you extra information. If the scores were biased then you could correct them by using racial information to undo the bias.
      For more detail on that last bit, read the "What if measurements are biased?" section of my blog post: https://www.chrisstucchio.com/blog/2016/alien_intelligences_...
      (The details differ a bit - I describe linear regression rather than cox models. But the basic idea is the same.)
yummyfajitas 9 years ago
According to propublicas own analysis, the claim of bias cannot be shown to be statistically significant. https://www.propublica.org/article/how-we-analyzed-the-compa...
This article is terrible data journalism and probably deliberately misleading.
Step 1: write down conclusion.
Step 2: do analysis.
Step 3: if analysis doesn't support conclusion, write down a bunch of anecdotes.
Really, here's her R script: https://github.com/propublica/compas-analysis/blob/master/Co...
Just read that. It's vastly better than this nonsensical article.
- daveguy 9 years ago
  They analyzed what they could -- the outcomes of the algorithm (recommendation) and the accuracy of those recommendations. They picked out specific examples, but the analysis was over the whole data set. I think you missed these relevant parts from the article:
  > We obtained the risk scores assigned to more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014 and checked to see how many were charged with new crimes over the next two years, the same benchmark used by the creators of the algorithm.
  > The score proved remarkably unreliable in forecasting violent crime: Only 20 percent of the people predicted to commit violent crimes actually went on to do so.
  > The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants. White defendants were mislabeled as low risk more often than black defendants.
  > Could this disparity be explained by defendants’ prior crimes or the type of crimes they were arrested for? No. We ran a statistical test that isolated the effect of race from criminal history and recidivism, as well as from defendants’ age and gender.
  > Black defendants were still 77 percent more likely to be pegged as at higher risk of committing a future violent crime and 45 percent more likely to be predicted to commit a future crime of any kind.
  - yummyfajitas 9 years ago
    Go read the description of the statistical analysis or just view their R notebook:
    https://github.com/propublica/compas-analysis/blob/master/Co...
    Their own analysis shows that (p ~= 0) that high and medium risk factors are predictive. They also showed that the racial bias terms (race_factorAfrican-American:score_factorHigh, etc) are probably not predictive (p > 0.05).
    Your quotes are not evidence of bias, though I see how they might confuse an innumerate reader. It's interesting how good a job this article is doing confusing the innumerate - it's almost as if it was written to mislead without technically lying.
    For example, black defendants being pegged as being more likely to commit crimes can be caused by one of two things: bias or perhaps black defends actually are more likely to commit crimes. According to ProPublica's own analysis (see race_factorAfrican-American), the latter is actually the case. This is true with p = 4.52e-06 - see line [36].
    - daveguy 9 years ago
      I read through the entire analysis. It appears that you stopped reading after you saw a p-value that supported your bias. That is bias in the sense of pre-conceived notion. You then proceeded to pedantically argue that the well demonstrated bias of the algorithm (more false positives for blacks than whites about 40% vs 20%) does not exist because of a p-value that came in between 0.05 to 0.1 instead of below 0.05.
      Please let me know when your reading comprehension catches up with your mediocre statistics comprehension.
      Maybe you just didn't realize that the 20-20 hindsight data -- prediction vs recidivism -- is included right there in the analysis. Or maybe you did realize it later and just decided you'd dug in so much that you didn't want to admit your ignorance.
      Or maybe you still haven't comprehended the difference between the meanings of the word bias.
    - 9 years ago
- kenjackson 9 years ago
  (From my above reply too, as it applies here also):
  Lets be clear -- if the null hypothesis in this case is true (that there is no bias), and all other assumptions made are true, there is a slightly greater than 5.7% chance of obtaining this result (or something even more skewed). That's a great bar for publication of SCIENCE. It's not a great bar for hiding behind a proprietary algorithm used in sentencing. People talk about misuse of p-values, but this takes the cake.
  - yummyfajitas 9 years ago
    If you want to criticize the details of her analysis, go ahead. I'm solidly in the Bayesian camp and I agree with you 100%. What I'd have done is computed posteriors on all these coefficients and then computed bayes factors/probability of bias.
    I'm confused though; the mood affiliation of your post somehow suggests that her less than perfect choice of a statistical methodology somehow supports her claims. Could you explain that? Or am I simply misunderstanding what you are trying to say?
    Also, lets suppose we just take her own analysis at face value, and don't view it through the p-value lens. The maximum likelihood estimate suggests that even if this effect is not random chance, it's not very big. I.e., the "score factor high" estimate is >8x larger than the "score factor high, race = black" estimate. Isn't this really good? Do you really think the human biases that this algorithm mitigates are lower than this?
    Lastly, what specific analysis would convince you that this algorithm is predictive and non-biased (or more realistically, not very biased)?
    - pdkl95 9 years ago
      > maximum likelihood
      That may be grounds for a mistrial. Decisions about crimes are not judged by the "maximum likelihood".
      > what specific analysis would convince you that this algorithm is predictive and non-biased
      What is it going to take to convince you that the choice of model and which data to use as input is just as important as the analysis itself?
      > race_factor
      Depending on the situation, using race or other protected classes is illegal. One of the reasons we have a right to face our accusers is to provide an opportunity to challenge those accusations. Racial (or any other protected class) discrimination doesn't become legal when it is hidden behind an equation or algorithm. If the government wants to keep the method secret, then anything derived from those methods should be excluded.
      > human biases
      ...are off topic. An algorithm needs to justify it's own existence.
      > it's not very big
      So you're fine with racial bias, as long as it only affects what you consider a "small" number of people.
      > or perhaps black defends actually are more likely to commit crimes
      /sigh/
- 9 years ago
thejefflarson 9 years ago
Thanks for posting this. I encourage this crowd to to take a look at the methodology too: https://www.propublica.org/article/how-we-analyzed-the-compa...
- gleb 9 years ago
  Are you sure what you found is not just Simpson's paradox?
  When I look at the 2 KM plots for white/blacks, they are mostly the same. It's pretty clear that the model is not prejudiced against blacks, in fact it's somewhat prejudiced against whites. [1]
  Your main editorial claim is that whites tend to be misclassified as "good" and blacks as "bad."
  But I think what's actually happening is that algorithm is more likely to misclassify low_risk as "good", and high_risk as "bad".[2] Combine that with vastly more whites than blacks being low_risk (as you show earlier) and you get the observed "injustice".
  I'll also note that the KM for whites flatten out at 2 years, unlike for blacks. This is actually a big deal if statistically significant. But that's a separate conversation.
  Footnotes:
  1 - this is acknowledged in methodology page "black defendants who scored higher did recidivate slightly more often than white defendants (63 percent vs. 59 percent)."
  2 - why that is I don't yet fully understand (and I'd like to) but it looks's to be simple math that follows from low risk mostly not recidivating, and high risk mostly yes recidivating
- gleb 9 years ago
  Thanks for posting a link to the methodology.
  Does this sentence "Northpointe does offer a custom test for women, but it is not in use in Broward County. " imply that the base COMPAS model does not take gender into account?
wyager 9 years ago
I don't have an issue with using statistical analysis to direct crime prevention efforts. I think it's unconscionable to use statistical analysis for sentencing. We don't want Minority Report in real life.
- michaelbuddy 9 years ago
  I think the problem is, the amount of crime is causing the legal system to buckle. So there is a search for solutions to make the process more efficient. There may be some value to sentencing standards that may serve as a deterrent.
  In other words, if you are a repeat offender, some cases you think you know what your lawyer can do for you. But a system replacing that, one that is overly harsh may deter you. All things being equal in a system of punishment, I think I want the one that's got some deterrence in it. So this is worth exploring.
  If every criminal knew that getting caught meant being put into a meat grinder of sorts, I wonder how that would change their thinking about how to navigate the world and problem solve.
Dowwie 9 years ago
Consistent with the theme of this story is the content from discussions held at a conference at NYU School of Law, featuring human rights and legal scholars. Coincidentally, I submitted a link on this yesterday
See https://news.ycombinator.com/item?id=11753089
thisisdave 9 years ago
In block [37] of the ipython notebook, are racial main effects missing? I only see interactions.
https://github.com/propublica/compas-analysis/blob/master/Co...