Skip to main content


Showing posts from 2016

Weapons of Math Destruction by Cathy O'Neil

Weapons of Math Destruction (2016) by Cathy O'Neil

Solid work of practical ethics, covering the rights and wrongs of applied statistics in general, but particularly our mass, covert automation of business logic in schools, universities, policing, sentencing, recruitment, health, voting... Much to admire: she is a quant (an expert high-stakes modeller) herself, understands the deep potential of modelling, and prefaces her negative examples with examples of great models, methods of math construction - the moneyball wonks, the FICO men, in mid-2011, when Occupy Wall Street sprang to life in Lower Manhattan, I saw that we had work to do among the broader public. Thousands had gathered to demand economic justice and accountability. And yet when I heard interviews with the Occupiers, they often seemed ignorant of basic issues related to finance... They are lucky to have her!

A 'Weapon of Math Destruction' is a model which is unaccountably damaging to many people's lives.…

what I said to you in 2016

Checklist for toxic algorithms

Based on comments in O'Neil's Weapons of Math Destruction. Full review here.

Is the subject aware they are being modelled? Is the subject aware of the model's outputs? Is the subject aware of the model's predictors and weights? Is the data the model uses open?Is it dynamic - does it update on its failed predictions?
ScaleDoes the model make decisions about many thousands of people?Is the model famous enough to change incentives in its domain?Does the model cause vicious feedback loops?Does the model assign high-variance population estimates to individuals?
DamageDoes the model work against the subject's interests?If yes, does the model do so in the social interest?Is the model fully automated, i.e. does it make decisions as well as predictions?Does the model take into account things it shouldn't? Do its false positives do harm? Do its true positives?Is the harm of false positives symmetric with the good of true positives?

Note that "Inaccuracy" i…

notable wordwordword

dragon-king (n.): An extreme event among extreme events: roughly, an outlier of a Pareto distribution, even. An elaboration on Taleb's black swan metaphor for unforeseeable extreme events. Not sure if it adds much, since the black swan is distribution-independent and Taleb doesn't fixate on power laws iirc.
chef's arse (n.): Painful chafing of the buttocks against each other; attends exercise in hot environments.
groufie ( n.): group selfie, obvs. No less contemptible for the awkward swerve around "groupie".
detaliate (mangled v.): To explain. Seen in this Quora answer by a non-native English speaker (possibly Romanian). I want to appropriate it: to detaliate is to respond to casual comments with a fisking.
consing: (n.): To save on memory allocation by comparing new values to existing allocations and just storing a hash to the existing one if it's a hit. From Lisp's cons cells, a basic key-value data structure.
sadcore: ( n.): Slow indie. Journo term: avoided…

Notes from Effective Altruism "Global" "x" in Oxford, in 2016

(This is about this thing. The following would work better as a bunch of tweets but seriously screw that: )


Single lines which do much of the work of a whole talk:

"Effective altruism is to the pursuit of the good as science is to the pursuit of the truth." (Toby Ord)

"If the richest gave just the interest on their wealth for a year they could double the income of the poorest billion." (Will MacAskill)

"If you use a computer the size of the sun to beat a human at chess, either you are confused about programming or chess." (Nate Soares)

"Evolution optimised very, very hard for one goal - genetic fitness - and produced an AGI with a very different goal: roughly, fun." (Nate Soares)

"The goodness of outcomes cannot depend on other possible outcomes. You're thinking of optimality." (Derek Parfit)


data jobs, tautologies, bullshit, $$$

(c) Tom Gauld (2014)
When physicists do mathematics, they don’t say they’re doing “number science”. They’re doing math. If you’re analyzing data, you’re doing statistics. You can call it data science or informatics or analytics or whatever, but it’s still statistics... You may not like what some statisticians do. You may feel they don’t share your values. They may embarrass you. But that shouldn’t lead us to abandon the term “statistics”.

Karl Broman

what makes data science special and distinct from statistics is that this data product gets incorporated back into the real world, and users interact with that product, and that generates more data: a feedback loop. This is very different from predicting the weather...

– Cathy O'Neil / Rachel Schutt

"Data science" is the latest name for an old pursuit: the attempt to make computers give us new knowledge. * In computing's short history, there have already been about 10 words for this activity (and god knows how many deri…

notable oral noises

Strine (Oz proper n.): that thick Australian accent. Onomatopoeic: just say "Striiine" - "(Au)stralian" - with a long ɒ sound.
curioso (C17th It. n.): Brilliant enthusiast of unusual things. Originally synonymous with virtuoso; a word for a proto-scientist / Renaissance man.
sockdolager (American n.): A finisher; an exceptional thing. Probably from "sock" (punch) and "doxology" (final hymn). Was the last word heard in the theatre before Lincoln was shot amidst laughter.
gunsel (originally Yiddish n.): 1) hoodlum; Player. 2) catamite - from the Yiddish גענדזל, gosling. <3. The derived term "gunselism" has exactly 1 hit and how often do you see that?
green ink letter (n.): A lunatic rant sent in to the Letters page.
cromulent (adj.): blameless; fine. Made up by a Simpsons writer to demonstrate Frege's Context Principle (or Springfield's inbreeding).
Taco Bell Programming (n.): the discipline of solving software engineering problem…

feel for data

"This isn't right. Imagine: we give them a loss function, without a utility function. They can't feel good; only less bad."
"It's the same with us, tho. What we call utility is just the absence of loss."
"I'm not sure that's true. Pride feels to be more than the absence of shame; love is more than absence of loneliness."
"There's a fairly big gap between your two examples. And it's hard to think clearly when strong pleasure or pain is implicated."
"Nevertheless, yours is the view requiring a mass redefinition of natural language to make two entities become one."
"I don't mind. Even if they're not identical, we can still capture most of all value by reducing harm."
"I don't see how you can know that."
"Obvs I don't know it infallibly, but anyway it can't hurt."
"You might be more ambitious than such moral hedging."
"Yes, as soon as possible: tha…

Highlighted passages from Ronson's So You've Been Publicly Shamed

Something of real consequence was happening. We were at the start of a great renaissance of public shaming. After a lull of almost 180 years (public punishments were phased out in 1837 in the United Kingdom and in 1839 in the United States), it was back in a big way. When we deployed shame, we were utilizing an immensely powerful tool. It was coercive, borderless, and increasing in speed and influence. Hierarchies were being leveled out. The silenced were getting a voice. It was like the democratization of justice. And so I made a decision. The next time a great modern shaming unfolded against some significant wrongdoer—the next time citizen justice prevailed in a dramatic and righteous way—I would leap into the middle of it. I’d investigate it close up and chronicle how efficient it was in righting wrongs.

After the interview was over, I staggered out into the London afternoon. I dreaded uploading the footage onto YouTube because I’d been so screechy. I steeled myself for comments…

mandatory personal development module blues

After these sessions, we often see people start to notice their Myers-Briggs type coming into play in everyday life, and being more analytic about the types of those around them. Really using it, really thinking about it.
(The birth of tragedy.)


So, what do you think of your test results?"
"I would prefer not to."
"I don't agree with it."
"Ah sure - people often find something a bit off with it, at first. Have you added in your epicycles?"
"Yes, but twenty more badly conceptualised variables don't really help matters. It's not the particular type that's the problem, but the typology. Two ofthe dichotomies are simply false; they do not trade-off in my mind, nor in the population's minds; they feign the use of a single interval scale without actually picking out a single real variable; and you forc…

notable nonjargon jargon

Technical books often use seemingly nontechnical, apparently normative terms: you're marching through your dense and spidery notation, and suddenly you tread in a gob of ordinary language. Some of the most important concepts in the formal sciences are of this sort, in fact:

well-behaved. "not weird; having all properties suitable for the present study; not in violation of any of the assumptions we just made". One of the big offenders, used everywhere and never defined truly, only by context. Usually "well-behaved compared to an unrestricted superset we don't want to handle right now".

well-defined. "unambiguous; blessed with just one interpretation". One of the core differences between the formal sciences and other enquiry. Terminology in other fields is nowhere near as clear as this (not even ones which seem highly formalised, like Spinoza's Ethics or Wittgenstein's Tractatus or half of Spencer-Brown's Laws of Form*).

Why is well-def…

Highlighted passages from MacFarquhar's Strangers Drowning

Some people try to help one person at a time, and other people try to change the whole world. There's a seductive intimacy in the first kind of work, but it can also be messy and unpredictable. People may resent help that is so intimate, and if it goes badly, the blunder is personal. Even when the help succeeds, the victories are small and don't really change anything. The second kind of work is more ambitious, and also cleaner, more abstract. But success is distant and unlikely, so it’s helpful to have a taste for noble failure, and for the camaraderie of the angry few...

[Dorothy]: "They were people you did not want to be around. They were so sharp. Everything was a matter of life and death: we've got to do this action because the world depends on it."

In 1967, a long-term study of living, unrelated kidney donors was initiated, with the aim of helping transplant centers form policies on these confounding individuals. The study subjected the donors to free-as…

notable words of all seasons

to curry (equestrian v.): to groom, firmly brush all over. (Don't freak out if someone tells you they're off to curry a horse.) See also "to decompose into univariate functions" and "shout at". People really like this word.
GLEE (adj.): "Gay, Lesbian, and Everyone Else". I like this; the current bien-pensant name has grown to "LGBTQIA"; not pronounceable, nor even anagrammable. But GLEE probably can't catch on, since people will see abstraction as erasure, and also won't like it tacitly including majority people.
septentrional (poncey adj.): Northern. Used in the clumsy retroactive Latin for the US, "Civitatibus Foederatis Americae septentrionalis".
boodles (adj.): Butternut squash noodles.
avi (n.): Profile picture ("avatar"). Annoying; optimised for tweeting, not speaking.
OC (adj.): Original character; in fanfiction, a new protagonist added to the existing cast. Pejorative?
procursive (adj.): forward-running. Use…

preacher or engineer

If your software only uses 8-bit characters, if it does not set an explicit charset, then it cannot handle non-English languages. This excludes 80% of the world - mostly nonwhite people. So developers who don't handle different character encodings are racist. And we do not associate with racists. So we need our own, non-racist versions of all ASCII software; yes, this may take all our lives, but the cause is just, and when it comes to justice there is no calculus, no compromise possible. Are you with me?


If your software only uses 8-bit characters, if it does not set an explicit charset, then it cannot handle non-English languages. It's silly and extremely inefficient to limit your software's reach so much for the sake of two missing functions. The cost is an hour or two of development; the payoff is increasing your potential userbase by a factor of 6. This will also expand the pool of potential contributors to your project enormously. And besides, glyph encoding is an …