Friday, January 09, 2009
1/2 a girl vs. 2/3 a boy; or—I suck at stats
Listen, here's the thing. If you can't spot the sucker in the first half hour at the table, then you are the sucker.
Back in my college days, I was invited to a poker game, and I'm sure by sheer coincidence, the said day just happened to be pay day. Now, while I knew (and still know) what the various hands are (“flush”—five cards of the same suit, “full house”—three of a kind with a pair, “royal flush”—the ace, king, queen, jack and 10 of a single suit, etc), I didn't know (and still don't) the ranking of the hands—which hands won over which hands. I was assured that wouldn't matter and that I could have a “cheat sheet.” So I arrive at the game with a huge pocket full of money and an attitude of “how hard can this be?” Said attitude was reinforced as I won a few early rounds.
The end of the night came with the end of my money.
I learned two lessons that night:
- Never, ever play poker again, and
- I am bad at statistics.
While the first lesson sunk in (and to this day, I haven't played a game of poker, so my record stands at a rather dismal 0–1) I forgot the second lesson—that I suck at statistics.
Monday, I wrote about pairs of kids and the odds of a particular pairing, given some information.
Let's say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl. What are the odds that person has a boy and a girl?
Coding Horror: The Problem of the Unfinished Game
I read the explanation for the 2/3 results, said “Okay, I can see that,” accepted it as gospel and went about my business, which involved me going back and forth with someone over this issue, with both of us firm on our respective view points (me: 2/3; Vorlath: 1/2).
Wanting to settle this once and for all, I wrote a very verbose program (it's written for clarity, not to be fast or anything—this is a very tricky problem and yes, the program is verbose) that picks a bazillion pairs of kids and brute forces the results so that I can figure out who's right and who's wrong.
Value | Percentage | |
---|---|---|
Total # of kids | 20000000 | 100.0 |
Boys | 10002254 | 50.0 |
Girls | 9997746 | 50.0 |
I ran this program for 10,000,000 pairs. 20,000,000 virtual kids were created for this. 50% boys, 50% girls. No controversy here.
Value | Percentage | |
---|---|---|
Total # of pairs | 10000000 | 100.0 |
Boy/Boy | 2501203 | 25.0 |
Boy/Girl | 2499753 | 25.0 |
Girl/Boy | 2500095 | 25.0 |
Girl/Girl | 2498949 | 25.0 |
At least one Boy | 7501051 | 75.0 |
At least one Girl | 7498797 | 75.0 |
Again, nothing unexpected here either. Four possible pairings, 25% of each pairing. 75% of the pairings will have at least one girl, and 75% will have at least one boy. Again, straight from the numbers. So far, so good.
Value | Percentage | |
---|---|---|
Total # of pairs | 10000000 | 100.0 |
Disclosed First Kid | 5000671 | 50.0 |
Disclosed Second Kid | 4999329 | 50.0 |
Disclosed Girl | 4999440 | 50.0 |
Disclosed Boy | 5000560 | 50.0 |
Nothing seems wrong here; half the kids being disclosed are the first ones; independently, half of the kids being disclosed are boys. But there is a problem here, but for now, I'll leave it to the reader to spot the issue (and it is an issue with this problem). I didn't spot the problem until later myself.
Value | Percentage | |
---|---|---|
Disclosed Girl | 4999440 | 100.0 |
First kid | 2499547 | 50.0 |
Second kid | 2499893 | 50.0 |
Disclosed Girl, other girl | 2498949 | 50.0 |
First kid | 1249211 | 25.0 |
Second kid | 1249738 | 25.0 |
Disclosed Girl, other boy | 2500491 | 50.0 |
First kid | 1250336 | 25.0 |
Second kid | 1249738 | 25.0 |
Disclosed Girl, pick girl, correct | 2498949 | 50.0 |
First kid | 1249211 | 25.0 |
Second kid | 1249738 | 25.0 |
Disclosed Girl, pick girl, wrong | 2500491 | 50.0 |
First kid | 1250336 | 25.0 |
Second kid | 1250155 | 25.0 |
Disclosed Girl, pick boy, correct | 2500491 | 50.0 |
First kid | 1250336 | 25.0 |
Second kid | 1250155 | 25.0 |
Disclosed Girl, pick boy, wrong | 2498949 | 50.0 |
First kid | 1249211 | 25.0 |
Second kid | 1249738 | 25.0 |
[The first three lines of this particular table can be read as:
- a girl was disclosed
- the disclosed girl was the first kid in the pair
- the disclosed girl was the second kid in the pair
The line labeled “Disclosed Girl, pick girl, correct” can be read as: a girl was disclosed, we picked the other kid as being a girl, and we were correct.” —Editor]
Well … XXXX! I was wrong! The odds are 50/50. I was all set to start posting this when I noticed Vorlath conceeding the 2/3 position on this follow- up post.
I must have missed something in the program.
Okay, what if I exclude from consideration the boy/boy pairs entirely? How do the odds change then? One two-line patch later and …
Value | Percentage | |
---|---|---|
Total # of kids | 15000398 | 100.0 |
Boys | 4998619 | 33.3 |
Girls | 10001779 | 66.7 |
Okay, numbers are 75% of what we had … so far so good.
Value | Percentage | |
---|---|---|
Total # of pairs | 7500199 | 100.0 |
Boy/Boy | 0 | 0.0 |
Boy/Girl | 2500052 | 33.3 |
Girl/Boy | 2498567 | 33.3 |
Girl/Girl | 2501580 | 33.4 |
At least one Boy | 4998619 | 66.6 |
At least one Girl | 7500199 | 100.0 |
Yes, that's what would be expected by dropping a quarter of all pairings.
Value | Percentage | |
---|---|---|
Total # of pairs | 7500199 | 100.0 |
Disclosed First Kid | 3750492 | 50.0 |
Disclosed Second Kid | 3749707 | 50.0 |
Disclosed Girl | 5002113 | 66.7 |
Disclosed Boy | 2498086 | 33.3 |
Value | Percentage | |
---|---|---|
Disclosed Girl | 5002113 | 100.0 |
First kid | 2500888 | 50.0 |
Second kid | 2501225 | 50.0 |
Disclosed Girl, other girl | 2501580 | 50.0 |
First kid | 1250803 | 25.0 |
Second kid | 1250777 | 25.0 |
Disclosed Girl, other boy | 2500533 | 50.0 |
First kid | 1250085 | 25.0 |
Second kid | 1250777 | 25.0 |
Disclosed Girl, pick girl, correct | 2501580 | 50.0 |
First kid | 1250803 | 25.0 |
Second kid | 1250777 | 25.0 |
Disclosed Girl, pick girl, wrong | 2500533 | 50.0 |
First kid | 1250085 | 25.0 |
Second kid | 1250448 | 25.0 |
Disclosed Girl, pick boy, correct | 2500533 | 50.0 |
First kid | 1250085 | 25.0 |
Second kid | 1250448 | 25.0 |
Disclosed Girl, pick boy, wrong | 2501580 | 50.0 |
First kid | 1250803 | 25.0 |
Second kid | 1250777 | 25.0 |
And it's still 50/50! Am I missing anything else?
Okay, re-read even more comments and looking closer at the original problem statment:
Let's say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl. What are the odds that person has a boy and a girl?
Coding Horror: The Problem of the Unfinished Game
Oh, there's an unstated assumption going on—namely, what gender the hypothetically speaking parent will reveal! So far, I've had the hypothetically speaking parent disclosing a randomly picked child (first or second), which could be either a girl or a boy. Add some more lines to force the child to be disclosed as a girl (if there is a girl) and …
Value | Percentage | |
---|---|---|
Disclosed Girl | 7500174 | 100.0 |
First kid | 4999692 | 66.7 |
Second kid | 2500482 | 33.3 |
Disclosed Girl, other girl | 2501019 | 33.3 |
First kid | 2501019 | 33.3 |
Second kid | 0 | 0.0 |
Disclosed Girl, other boy | 4999155 | 66.7 |
First kid | 2498673 | 33.3 |
Second kid | 0 | 0.0 |
Disclosed Girl, pick girl, correct | 2501019 | 33.3 |
First kid | 2501019 | 33.3 |
Second kid | 0 | 0.0 |
Disclosed Girl, pick girl, wrong | 4999155 | 66.7 |
First kid | 2498673 | 33.3 |
Second kid | 2500482 | 33.3 |
Disclosed Girl, pick boy, correct | 4999155 | 66.7 |
First kid | 2498673 | 33.3 |
Second kid | 2500482 | 33.3 |
Disclosed Girl, pick boy, wrong | 2501019 | 33.3 |
First kid | 2501019 | 33.3 |
Second kid | 0 | 0.0 |
That's what I'm looking for! That's the unstated assumption being made by the 2/3 camp! And my original summation of the whole problem: “The odds are 1/2, except, of course, when it's 2/3,” is correct (so to speak).
Sheesh!
So, I suck at statistics, and statistical word problems are hard to write properly.
And now I can put this problem to rest.