Main »

Goodhart

NOTE This should really be named Goodhart's Curse but we don't currently have a cheap way to rename things and not break links, needs followup

Pattern

Our brains condition us, often without us noticing, via tiny flinches and nudges, in a way that can accumulate and compound and meaningfully impact our behavior, and is often miscalibrated for our goals and values.

There are four important pieces to the puzzle of this pattern.

  1. When you use a proxy to track your progress toward a goal, you eventually go astray. Think students practicing test-taking skills rather than the core material they're supposed to be learning, or how our tongues evolved to like sugar because it was a marker for things like fruit, which had lots of vitamins and micronutrients along with the raw calories.
  2. Very small rewards-and-punishments can have an outsized conditioning effect, as long as they are delivered quickly. For example, if you can get a dog to associate a "click" sound with some kind of positive outcome (such as a treat), then you will be able to train the dog much more rapidly than you could by merely tossing treats, because the click can be delivered much faster than the treat.
  3. Our brains tend to condition us (this is a slightly nonsensical sentence but it gestures in the right direction). The brain, aggregating information about past experience and seeking to "nudge" the body in good directions, delivers small stimuli such as flinches, fleeting emotions, quick physiological sensations like a tightening of the throat, and little snippets of verbal thought, based on implicit models of what sorts of things it would be beneficial to approach or avoid.
  4. These little flinches and nudges tend to be based on surface-level characteristics, because on a quick/implicit/reflexive level, our brains are not super good at tracking nuance and complexity. e.g. our brains will produce a feeling of uneasiness if someone who looks like a person we previously had trouble with enters our visual field, even if they are not the same person.

Because these little flinches and nudges arrive near-instantaneously, they are meaningfully effective at shaping our behavior. But because they are hooked up to proxies rather than to our actual goal, they are meaningfully misleading, and tend to end up sending us in the wrong direction.

Therefore:

Deliberately practice noticing the flinches and nudges, so that you can be aware of the shaping effect they are having on your behavior (and pump against that effect if it seems misaligned).

Going further