This is a sequel to Asian Bayesian

I saw a really cool probability problem posted by Kartik Chandra on the MIT Mastodon instance:

Suppose two Bayesians, Alice and Bob, decide to put on a variety show where they take turns tossing a biased coin and announcing outcomes to a live studio audience. (Bayesians love this kind of thing–it keeps them entertained for hours…)

Unfortunately, just as Alice goes on stage, she realizes with dread that she forgot to bring the coin. Thinking on her feet, she mimes pulling a tiny imaginary coin out of her pocket, and says “This is a biased coin!” It works–the audience buys it and the crowd goes wild.

Breathing a sigh of relief, Alice mimes tossing the pretend coin and randomly announces “heads” or “tails” to the audience. Then, she hands the “coin” to Bob, who catches on and also mimes a toss. (This has just turned into a Bayesian improv show.)

But now Bob has a problem. Should he say “heads” or “tails”? He could choose uniformly at random, but the audience will get suspicious if the coin appears to be fair, or if his tosses don’t match the statistics of Alice’s. How can he keep up the charade of a biased coin?

Here’s what Bob does. In the spirit of “yes, and…,” he infers the coin’s bias based on Alice’s reported outcome (say, with a uniform prior) and samples a fresh outcome with that inferred bias. So if Alice said “heads,” Bob would be a bit likelier to say “heads” as well.

Then Alice takes the coin back and does the same, freshly inferring the coin’s bias from the past two tosses. In this way, the two actors take turns announcing simulated outcomes to the oblivious audience, while building a shared understanding of the coin’s bias.

Okay, so what happens? How can we characterize the sequence of outcomes? Intuitively, we might expect either a “rich-get-richer” effect where they end up repeating heads or tails. Or we might expect a “regression-to-the-mean” where they converge to simulating a fair coin.

Honestly, I prefer Bayesian variety shows to Asian variety shows. OK, bad puns aside, let’s dig into this problem!

First off, let’s try an example. At the beginning, Alice just chooses $H$ or $T$ with $\frac{1}{2}$ probability. Let’s say Alice chooses $H$. Now Bob should be more inclined to say $H$, but by how much? The problem states that Bob has a uniform prior, which basically means that initially he believes the bias of the coin for heads could anything between 0 and 1 uniformly at random. Once he sees Alice choose $H$, he has to update his beliefs. Now his probability of choosing another $H$ is

\[P(HH|H) = \frac{P(HH)}{P(H)}.\]

You might be tempted to just say $P(H) = \frac{1}{2}$ and $P(HH) = \frac{1}{4}$ but Bob doesn’t actually know for sure what the bias of the coin is. To correctly compute these values, imagine we have the unit interval $[0, 1]$ and Bob first chooses uniformly at random a point in the interval, which corresponds to the bias of the coin for heads. Now, for each flip, Bob chooses another point uniformly at random on the interval and checks if the point is less than the bias point. If so, the flip is $H$ and otherwise, $T$.

Let’s compute $P(H)$. This corresponds to choosing a bias point and then a flip point. Imagine that we chose two point locations first, and then assigned at random one to be the bias point and one to be the flip point. Then obviously there’s a $\frac{1}{2}$ chance that the bias point is larger than the flip point, so $P(H) = \frac{1}{2}$. For $P(HH)$, we will choose a bias point and then two flip points. This is equivalent to choosing the three points, and then assigning one at random to be the bias point. There’s only a $\frac{1}{3}$ chance that the bias point will be the highest, so thus, $P(HH) = \frac{1}{3}$. Therefore, we have

\[P(HH|H) = \frac{1/3}{1/2} = \frac{2}{3}.\]

The general case

Cool. Now say that the game goes on for a while, and there have been $x$ heads and $y$ tails in some order. Now it’s someone’s turn, and they’ve updated their beliefs about the bias of the coin based on the previous $x+y$ flips. Note that they would obtain the same final belief if they received the knowledge of all $x+y$ flips at the same time, compared to receiving the knowledge about the flips one at a time. If they receive all the knowledge at the same time, then the order of the $x+y$ flips is no longer important. The only thing we care about is that there were $x$ heads and $y$ tails. Therefore, the probability of choosing another $H$ based on this information is

\[P(\#H=x+1,\#T=y|\#H=x,\#T=y) = \frac{P(\#H=x+1,\#T=y)}{P(\#H=x,\#T=y)}.\]

Alright, now to calculate the two probabilities on the right, we can use a similar method as above. Let’s find $P(\#H=x,\#T=y)$. Imagine choosing a bias point, and then $x+y$ flip points. We’d like the bias point to be greater than $x$ points and smaller than the $y$ points, because that would correspond to $x$ heads and $y$ tails. Again, we can view this instead as choosing $x+y+1$ point locations, and assign the bias point and the $x+y$ flip points to those locations. There are a total of $(x+y+1)!$ ways to do this, but only $x!y!$ of them have $x$ head points, a bias point, and then $y$ tail points. Thus, $P(\#H=x,\#T=y) = \frac{x!y!}{(x+y+1)!}$. Similarly, we can compute that $P(\#H=x+1,\#T=y) = \frac{(x+1)!y!}{(x+y+2)!}$. Thus, we get

\[P(\#H=x+1,\#T=y|\#H=x,\#T=y) = \frac{(x+1)!y!(x+y+1)!}{x!y!(x+y+2)!} = \frac{x+1}{x+y+2}.\]

Now let’s say Alice and Bob play this game $n = 6$ times and obtain the sequence $H, T, H, H, H, T$. What’s the probability of this happening? Using our rule that we just derived, the probability of $H$ at a certain time is $\frac{x+1}{x+y+2}$ given that we’ve seen $x$ heads and $y$ tails so far. Thus, the probability of that sequence is

\[\frac{1}{2} \frac{1}{3} \frac{2}{4} \frac{3}{5} \frac{4}{6} \frac{2}{7} = \frac{2!4!}{7!}.\]

In general, let $k$ be the total number of heads in a sequence and $n-k$ be the total number of tails in the sequence. Then the probability of obtaining that sequence is

\[\frac{k!(n-k)!}{(n+1)!}.\]

Awesome!

Pólya’s urn

Now let’s consider a different problem, which is a special case of the Pólya’s urn model. Alice and Bob have an urn, which contains 1 hazel ball and 1 teal ball. Alice and Bob will take turns drawing a ball from the urn. If they draw a hazel ball, they will return the hazel ball to the urn and additionally add another hazel ball. If they draw a teal ball, they will return the teal ball to the urn and add another teal ball. So what happens over $n$ drawings?

Assume there are $x$ hazel balls and $y$ teal balls in the urn. Then the probability of drawing a hazel ball is obviously $\frac{x}{x+y}$. Let’s say Alice and Bob play this $n = 6$ times and obtain $H, T, H, H, H, T$. We can easily compute the probability of this, which is

\[\frac{1}{2} \frac{1}{3} \frac{2}{4} \frac{3}{5} \frac{4}{6} \frac{2}{7} = \frac{2!4!}{7!}.\]

Whoa! The same thing as before? This game gives the exact same distribution of sequences as the fake coin toss game!

A third equivalent game

Those two answers both have a very interesting property: the probability of a sequence doesn’t depend on the order of the heads and tails or hazel balls and teal balls. The only thing that matters is the number of $H$s. All sequences of $k$ $H$s and $n-k$ $T$s are equally likely. In the coin game, from the point of view of the audience, it looks like Alice and Bob chose a bias first, and then flipped the coin $n$ times without changing the bias. But what bias does it appear like that picked initially?

The answer is that it looks like they choose the bias of the coin uniformly at random, and then proceeded to flip it $n$ times. Let’s compute the probability of obtaining a specific sequence. Actually, we’ve already done that above! The probability of obtaining a sequence with $x$ heads and $y$ tails is $\frac{x!y!}{(x+y+1)!}$, or equivalently, the probability of obtaining $k$ heads and $n-k$ tails is $\frac{k!(n-k)!}{(n+1)!}$. Nice, this matches the answer that we got in the previous two games. So yes, Alice and Bob can successfully use their fake coin process to deceive the audience!

All these games are connected to the beta distribution, which showed up in the previous Asian Bayesian post too. (Somehow, I managed to get through this entire post without doing a single integral!)

Further reading