Rose debug info

Decision theories, LW-style

There is a lot of confusion about Newcomb’s paradox, various decision theories discussed on LessWrong, free will, determinism, and so on. In this post, I try to make all of this less confusing.

I am not going to talk about the domain of decision theory in general. This post is purely about the parts LW is interested in.


Popular decision paradoxes are inherently contradictory – e. g. “what if you had free will but were still completely predictable” (Newcomb’s paradox), or “what if you really wanted agents to cooperate but also wanted them to be ruthless” (prisoner’s dilemma). Useful questions become useless mind-benders.

Decision theories are expressly for agents that don’t have free will – choosing a decision theory for an agent literally means “what algorithm should an agent blindly follow?”. Furthermore, a lot of interesting questions about decision theories only apply in settings when you have more than one agent following similar theories and you want something from them (like “cooperate without prior communication”).

Trying to design algorithms for those scenarios is much more productive than trying to spot contradictions in decision paradoxes where the roles of the algorithm designer and the algorithm executor coincide.

I. Newcomb’s paradox: what if you had free will but didn’t?

To mess with you, The Ultimate Predictor, also known as “Omega”, has maybe put an iPad into a box. Or maybe not.

On the top of the box, there is a note: if you punch yourself before opening the box, the box will have an iPad in it. You can take it and go home and brag to everyone and waste the next week watching funny British panel shows, especially Would I Lie To You?, alone, in the dark. (I would like to preemptively note that nothing of the sort has ever happened to me, except for the whole second part.)

Omega is not a god, but they are never wrong and can not possibly ever be wrong. This you know for sure. Do you obey and punch yourself before opening the box?

It seems like you definitely should. However! Omega can not change what’s inside the box, so let’s be Smart™. If there is an iPad, you should not punch yourself, because then you will have both the iPad and your dignity. If there isn’t an iPad, you should not punch yourself either, because why would you? So in both cases you can just skip that bit and open the box. Surely, it sounds like the proponents of this point of view – affectionately called “two-boxers” for complicated reasons – have a smart argument on their side.

However, those who obey Omega – “one-boxers” – have a good argument too, which is that they have iPads and the other side does not, despite being so very smart. So what should you do?

II. A rule of thumb: decision theories are for your kids, not for you

You are now living on the home planet of Omega. Not a day passes without being offered a box, or two boxes, or three boxes, and it gets old really fast. You resolve to never leave the house, and instead you have coaxed your kid to do your errands. Naturally, all iPads accumulated by the kid during those errands are your rightful property.

Now the question becomes: what should you teach the kid? And it’s a much easier question. You can teach the kid Evidential Decision Theory and off you go, while still believing Causal Decision Theory is smart and the previous one is dumb.

This is the deal with decision theories. Treat them as “what is the most useful behavior for a stupid agent in some stupidly convoluted world?”. They are not about you, the Mastermind Plotter, a free agent who is impossible to predict, yet somehow also possible to predict. They are about a kid, or maybe a self-driving car, or a religious community – i.e. an agent or set of agents that can be influenced.

III. Prisoner’s dilemma: what if you really wanted agents to cooperate but also wanted them to be ruthless?

Let’s apply this principle to another paradox, the prisoner’s dilemma. I don’t feel like inventing a silly framing for it, so you can read about it on Wikipedia.

Two players are playing a game:

  • If they both choose to cooperate, each gets a reward.
  • If one of them defects, the cheater gets a bigger reward and the nice player is, counterintuitively, punished.
  • If both of them defect, nobody gets anything.

The Smart™ reasoning goes like this: if the other player cooperates, you should defect and get a big reward. If the other player defects, you should also defect – to avoid punishment. Therefore, you should defect, period.

The dilemma lies in the fact that when the players are smart, the outcome is not. So being nice turns out to be better than being smart. Huh.

What is the right decision algorithm? To figure this out, again, think about a kid.

If you have a kid, and you only care about their success in life, and nothing else in the world, you should teach them to be smart. Maybe even psychopathic, though it is debatable.

If you have two kids, however, you should teach them to be smart but be nice to each other, so that they will get rewards whenever they happen to play with each other. (Or, if the defector’s reward is much bigger than the cooperator’s punishment, they should take turns at defecting and exploit the system.)

Why? Because you care about both kids! If you only care about one of them, you can teach one of them to be ruthless and the other – to be nice and turn the other cheek. If you care about both of them but want them to be ruthless, teach them to cooperate with each other and no one else. If you want them to be maximally ruthless and then you say “oh but why they don’t cooperate”, well, you want a contradiction.

Again, this is the deal with decision theories. I’m not going to use a fancy decision theory to decide how to live my life, but I am very interested in a fancy decision theory that I can instill into the malleable minds of my kids, readers, self-driving cars, whatever. And being Smart™ doesn’t quite cut it here – this is how you get, for instance, a fleet of murderous cars. This is why we need something better, and this is why thinking about decision theories is worth spending time on.

IV. XOR blackmail problem

In the bottomless chest of decision theory edge cases, there is another wacky one that we have to deal with.

An agent hears a rumor that their house has been infested by termites, at a repair cost of $1,000,000. The next day, the agent receives a letter from the trustworthy predictor Omega saying:

“I know whether or not you have termites, and I have sent you this letter if and only if exactly one of the following is true: (i) the rumor is false, and you are going to pay me $1,000 upon receiving this letter; or (ii) the rumor is true, and you will not pay me upon receiving this letter.”

Thinking “if I do X, it must be Y” does not work here. Evidential Decision Theory wants you to pay up to somehow magically end up in the universe where the rumor is false, because according to the problem statement, deciding to pay must mean that the rumor is false.

“But didn’t the same thing happen in the Newcomb’s paradox and you said exactly the opposite thing?” Once more, think of a kid to make it easier.

Do you want to teach your kid to pay upon receiving a letter like this? Then Omega will send them letters when someone spreads false rumors about them, and they will be bleeding money. Furthermore, they will not get any letters when the rumors are true.

Do you want to teach your kid to not pay? Then they will get the letters only when the rumors are true, and won’t have to pay anything. Awesome! They don’t bleed money and they get to know when their house is infected with termites, straight from the infallible Omega.

The difference between the two problems is that in Newcomb’s paradox, the contents of the box depend on your decision. In this problem, whether you have termites or not (i.e. what you care about) does not depend on your decision, the only thing that changes is whether you get a letter about it.

This way of thinking is called Functional Decision Theory. In particular, when you are confronted with entities that supposedly know exactly how you think, just go “What should I be thinking to get the best outcome? Okay, then I will think that.”

Note: I suspect that is that in real life it translates to “if you notice that people reward genuine kindness, try to figure out how to be genuinely kind and at the same time still screw people over, so that you get both the benefits of being kind and the benefits of screwing people over”. And it works!

V. Simulation as a causal mechanism

Going back to Newcomb’s paradox, it is still irritating that there is no causal link between “you decide to disobey Omega” and “the box is empty”. Maybe a kid can’t decide anything, but you definitely can, right?

The simulation argument could provide such a causal link.

If we know Omega is never ever wrong, they are probably simulating you to figure out what you will decide, kinda like in Black Mirror (e. g. Hang the DJ). So when you are deciding to take the box, you don’t know if it’s actually you, or you-in-the-simulation. By making the right decision while in the simulation, you can help out your non-simulation version.

This even works with otherwise mind-boggling variants of Newcomb’s paradox, like a variant where everything is the same except that the box is made of glass. You literally look at the box, see the iPad in it, and yet somehow you still have to obey Omega to get the iPad. Why? Because you’re in the simulation, and by choosing to obey the Omega, you will ensure that the real-world version of you will be presented with a full box, instead of an empty box.

A possible objection is: but what if Omega is just really good at psychology and statistics and so on, but doesn’t actually simulate anything? In this case...

VI. Determinism is a great answer to everything

There is no free will, it’s all an illusion, “what should you decide” is not a meaningful question. In fact, if Omega can look at your past life and predict what box you will choose, you personally don’t have much free will, sorry. “Omega probably just noticed that I always two-box when I’m having a grumpy day”. So why are you asking what should you choose, then? Are you having a grumpy day or not? It’s settled then.

Like, okay, you are staring at a glass box with an iPad in it. “Should” you obey Omega and punch yourself anyway? Or for people who have skipped my variant of Newcomb’s paradox entirely: should you one-box even when both boxes are transparent? The answer is: if you find yourself in this situation, you have learned something about yourself. Specifically, that you are a one-boxer. Or, to quote The Last Psychiatrist:

If some street hustler challenges you to a game of three card monte you don’t need to bother to play, just hand him the money, not because you’re going to lose but because you owe him for the insight: he selected you. Whatever he saw in you everyone sees in you, from the dumb blonde at the bar to your elderly father you’ve dismissed as out of touch, the only person who doesn’t see it is you, which is why you fell for it.

Note that this does not mean thinking about decision theories is meaningless – the question of “how should you indoctrinate your kid?” or “what should the self-driving car do?” is still relevant. The difference between you and the self-driving car is that the self-driving car does not have free will, but you supposedly do. Of course the question “what algorithm should I use?” becomes maddening then – you can not, at the same time, (a) follow an algorithm and (b) have free will, aka the ability to overrule the algorithm whenever you feel like it.

VII. The psychopath button

Here is another illustration: the psychopath button problem.

Paul is debating whether to press the ‘kill all psychopaths’ button. It would, he thinks, be much better to live in a world with no psychopaths. Unfortunately, Paul is quite confident that only a psychopath would press such a button. Paul very strongly prefers living in a world with psychopaths to dying. Should Paul press the button?

Should Paul press the button? If he does, he’s a psychopath and he shouldn’t have pressed it. If he doesn’t, he’s not a psychopath and he should have pressed it.

If you treat the button press as a choice between being a psychopath and not being one, the answer is clear: Paul should not press the button, i.e. should not be a psychopath.

If you assume that Paul does not have a choice, the question disappears completely – he will press the button if he’s a psychopath, he won’t if he’s not, in both cases the consequences won’t be good, but that’s how life is sometimes.

The question is only a conundrum when you insist on it being a choice and not being a choice at the same time. Well, good luck with that.

VIII. Conclusion

This is how I recommend approaching decision problems.

If you want to figure out how your robots/kids/agents/cars should behave, mostly drop the philosophy. Look at the history of e. g. cooperation tournaments and what tends to work well there. Do your own experiments. Think about whether you care about the agents, or about the world that the agents are in, and in what proportion. Think about whether you can build a reliable way for agents to read each other’s intentions – e. g. humans can’t hide being angry because their faces get red, stuff like that. Trusted computing, remote attestation. Vitalik Buterin’s vision for Ethereum is ultimately a cooperation platform: inspectable agents, non-forgeable identities, zero-knowledge proofs.

If you want to figure out how you should behave, there are usually two separate questions: “what kind of behavior will win in this implausible scenario?” and “how do I justify this to my/someone’s intuition?”. The first one is often straightforward, and the second one is often resolvable with a combination of the determinism hypothesis and the simulation hypothesis.

Finally, if the problem happens to lie along the lines of “you will do X, but doing X is bad for you, so what should you do, huh?”, just reduce it to this form explicitly and banish it from your mind forever. There are more interesting things to think about.

Subscribe to this blog
2020   rationality