The Boston Diaries

The ongoing saga of a programmer who doesn't live in Boston, nor does he even like Boston, but yet named his weblog/journal “The Boston Diaries.”

Go figure.

Friday, January 09, 2009

1/2 a girl vs. 2/3 a boy; or—I suck at stats

Listen, here's the thing. If you can't spot the sucker in the first half hour at the table, then you are the sucker.

Matt Damon, “Rounders

Back in my college days, I was invited to a poker game, and I'm sure by sheer coincidence, the said day just happened to be pay day. Now, while I knew (and still know) what the various hands are (“flush”—five cards of the same suit, “full house”—three of a kind with a pair, “royal flush”—the ace, king, queen, jack and 10 of a single suit, etc), I didn't know (and still don't) the ranking of the hands—which hands won over which hands. I was assured that wouldn't matter and that I could have a “cheat sheet.” So I arrive at the game with a huge pocket full of money and an attitude of “how hard can this be?” Said attitude was reinforced as I won a few early rounds.

The end of the night came with the end of my money.

I learned two lessons that night:

  1. Never, ever play poker again, and
  2. I am bad at statistics.

While the first lesson sunk in (and to this day, I haven't played a game of poker, so my record stands at a rather dismal 0–1) I forgot the second lesson—that I suck at statistics.

Monday, I wrote about pairs of kids and the odds of a particular pairing, given some information.

Let's say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl. What are the odds that person has a boy and a girl?

Coding Horror: The Problem of the Unfinished Game

I read the explanation for the 2/3 results, said “Okay, I can see that,” accepted it as gospel and went about my business, which involved me going back and forth with someone over this issue, with both of us firm on our respective view points (me: 2/3; Vorlath: 1/2).

Wanting to settle this once and for all, I wrote a very verbose program (it's written for clarity, not to be fast or anything—this is a very tricky problem and yes, the program is verbose) that picks a bazillion pairs of kids and brute forces the results so that I can figure out who's right and who's wrong.

Number of kids
 ValuePercentage
Total # of kids 20000000 100.0
Boys 10002254 50.0
Girls 9997746 50.0

I ran this program for 10,000,000 pairs. 20,000,000 virtual kids were created for this. 50% boys, 50% girls. No controversy here.

Pair Stats
 ValuePercentage
Total # of pairs 10000000 100.0
Boy/Boy 2501203 25.0
Boy/Girl 2499753 25.0
Girl/Boy 2500095 25.0
Girl/Girl 2498949 25.0
At least one Boy 7501051 75.0
At least one Girl 7498797 75.0

Again, nothing unexpected here either. Four possible pairings, 25% of each pairing. 75% of the pairings will have at least one girl, and 75% will have at least one boy. Again, straight from the numbers. So far, so good.

Disclosure table #1—Overview
  Value Percentage
Total # of pairs 10000000 100.0
Disclosed First Kid 5000671 50.0
Disclosed Second Kid 4999329 50.0
Disclosed Girl 4999440 50.0
Disclosed Boy 5000560 50.0

Nothing seems wrong here; half the kids being disclosed are the first ones; independently, half of the kids being disclosed are boys. But there is a problem here, but for now, I'll leave it to the reader to spot the issue (and it is an issue with this problem). I didn't spot the problem until later myself.

Disclosure table #2—disclosed a Girl
  Value Percentage
Disclosed Girl 4999440 100.0
  First kid 2499547 50.0
  Second kid 2499893 50.0
Disclosed Girl, other girl 2498949 50.0
  First kid 1249211 25.0
  Second kid 1249738 25.0
Disclosed Girl, other boy 2500491 50.0
  First kid 1250336 25.0
  Second kid 1249738 25.0
Disclosed Girl, pick girl, correct 2498949 50.0
  First kid 1249211 25.0
  Second kid 1249738 25.0
Disclosed Girl, pick girl, wrong 2500491 50.0
  First kid 1250336 25.0
  Second kid 1250155 25.0
Disclosed Girl, pick boy, correct 2500491 50.0
  First kid 1250336 25.0
  Second kid 1250155 25.0
Disclosed Girl, pick boy, wrong 2498949 50.0
  First kid 1249211 25.0
  Second kid 1249738 25.0

[The first three lines of this particular table can be read as:

  1. a girl was disclosed
  2. the disclosed girl was the first kid in the pair
  3. the disclosed girl was the second kid in the pair

The line labeled “Disclosed Girl, pick girl, correct” can be read as: a girl was disclosed, we picked the other kid as being a girl, and we were correct.” —Editor]

Well … XXXX! I was wrong! The odds are 50/50. I was all set to start posting this when I noticed Vorlath conceeding the 2/3 position on this follow-up post.

I must have missed something in the program.

Okay, what if I exclude from consideration the boy/boy pairs entirely? How do the odds change then? One two-line patch later and …

Number of kids
 ValuePercentage
Total # of kids 15000398 100.0
Boys 4998619 33.3
Girls 10001779 66.7

Okay, numbers are 75% of what we had … so far so good.

Pair Stats
 ValuePercentage
Total # of pairs 7500199 100.0
Boy/Boy 0 0.0
Boy/Girl 2500052 33.3
Girl/Boy 2498567 33.3
Girl/Girl 2501580 33.4
At least one Boy 4998619 66.6
At least one Girl 7500199 100.0

Yes, that's what would be expected by dropping a quarter of all pairings.

Disclosure table #1—Overview
  Value Percentage
Total # of pairs 7500199 100.0
Disclosed First Kid 3750492 50.0
Disclosed Second Kid 3749707 50.0
Disclosed Girl 5002113 66.7
Disclosed Boy 2498086 33.3
Disclosure table #2—disclosed a Girl
  Value Percentage
Disclosed Girl 5002113 100.0
  First kid 2500888 50.0
  Second kid 2501225 50.0
Disclosed Girl, other girl 2501580 50.0
  First kid 1250803 25.0
  Second kid 1250777 25.0
Disclosed Girl, other boy 2500533 50.0
  First kid 1250085 25.0
  Second kid 1250777 25.0
Disclosed Girl, pick girl, correct 2501580 50.0
  First kid 1250803 25.0
  Second kid 1250777 25.0
Disclosed Girl, pick girl, wrong 2500533 50.0
  First kid 1250085 25.0
  Second kid 1250448 25.0
Disclosed Girl, pick boy, correct 2500533 50.0
  First kid 1250085 25.0
  Second kid 1250448 25.0
Disclosed Girl, pick boy, wrong 2501580 50.0
  First kid 1250803 25.0
  Second kid 1250777 25.0

And it's still 50/50! Am I missing anything else?

Okay, re-read even more comments and looking closer at the original problem statment:

Let's say, hypothetically speaking, you met someone who told you they had two children, and one of them is a girl. What are the odds that person has a boy and a girl?

Coding Horror: The Problem of the Unfinished Game

Oh, there's an unstated assumption going on—namely, what gender the hypothetically speaking parent will reveal! So far, I've had the hypothetically speaking parent disclosing a randomly picked child (first or second), which could be either a girl or a boy. Add some more lines to force the child to be disclosed as a girl (if there is a girl) and …

Disclosure table #2—disclosed a Girl
  Value Percentage
Disclosed Girl 7500174 100.0
  First kid 4999692 66.7
  Second kid 2500482 33.3
Disclosed Girl, other girl 2501019 33.3
  First kid 2501019 33.3
  Second kid 0 0.0
Disclosed Girl, other boy 4999155 66.7
  First kid 2498673 33.3
  Second kid 0 0.0
Disclosed Girl, pick girl, correct 2501019 33.3
  First kid 2501019 33.3
  Second kid 0 0.0
Disclosed Girl, pick girl, wrong 4999155 66.7
  First kid 2498673 33.3
  Second kid 2500482 33.3
Disclosed Girl, pick boy, correct 4999155 66.7
  First kid 2498673 33.3
  Second kid 2500482 33.3
Disclosed Girl, pick boy, wrong 2501019 33.3
  First kid 2501019 33.3
  Second kid 0 0.0

That's what I'm looking for! That's the unstated assumption being made by the 2/3 camp! And my original summation of the whole problem: “The odds are 1/2, except, of course, when it's 2/3,” is correct (so to speak).

Sheesh!

So, I suck at statistics, and statistical word problems are hard to write properly.

And now I can put this problem to rest.

Obligatory Picture

[It's the most wonderful time of the year!]

Obligatory Contact Info

Obligatory Feeds

Obligatory Links

Obligatory Miscellaneous

You have my permission to link freely to any entry here. Go ahead, I won't bite. I promise.

The dates are the permanent links to that day's entries (or entry, if there is only one entry). The titles are the permanent links to that entry only. The format for the links are simple: Start with the base link for this site: http://boston.conman.org/, then add the date you are interested in, say 2000/08/01, so that would make the final URL:

http://boston.conman.org/2000/08/01

You can also specify the entire month by leaving off the day portion. You can even select an arbitrary portion of time.

You may also note subtle shading of the links and that's intentional: the “closer” the link is (relative to the page) the “brighter” it appears. It's an experiment in using color shading to denote the distance a link is from here. If you don't notice it, don't worry; it's not all that important.

It is assumed that every brand name, slogan, corporate name, symbol, design element, et cetera mentioned in these pages is a protected and/or trademarked entity, the sole property of its owner(s), and acknowledgement of this status is implied.

Copyright © 1999-2020 by Sean Conner. All Rights Reserved.