{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": [
"remove_input"
]
},
"outputs": [],
"source": [
"path_data = '../../../../data/'\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"plt.style.use('fivethirtyeight')\n",
"\n",
"import warnings\n",
"warnings.filterwarnings('ignore')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Monty Hall Problem ###\n",
"This [problem](https://en.wikipedia.org/wiki/Monty_Hall_problem) has flummoxed many people over the years, [mathematicians included](https://web.archive.org/web/20140413131827/http://www.decisionsciences.org/DecisionLine/Vol30/30_1/vazs30_1.pdf). Let's see if we can work it out by simulation.\n",
"\n",
"The setting is derived from a television game show called \"Let's Make a Deal\". Monty Hall hosted this show in the 1960's, and it has since led to a number of spin-offs. An exciting part of the show was that while the contestants had the chance to win great prizes, they might instead end up with \"zonks\" that were less desirable. This is the basis for what is now known as *the Monty Hall problem*.\n",
"\n",
"The setting is a game show in which the contestant is faced with three closed doors. Behind one of the doors is a fancy car, and behind each of the other two there is a goat. The contestant doesn't know where the car is, and has to attempt to find it under the following rules.\n",
"\n",
"- The contestant makes an initial choice, but that door isn't opened.\n",
"- At least one of the other two doors must have a goat behind it. Monty opens one of these doors to reveal a goat, displayed in all its glory in [Wikipedia](https://en.wikipedia.org/wiki/Monty_Hall_problem):\n",
"\n",
"![Monty Hall goat](monty_hall_goat.png) \n",
"\n",
"\n",
"\n",
"- There are two doors left, one of which was the contestant's original choice. One of the doors has the car behind it, and the other one has a goat. The contestant now gets to choose which of the two doors to open.\n",
"\n",
"The contestant has a decision to make. Which door should she choose to open, if she wants the car? Should she stick with her initial choice, or switch to the other door? That is the Monty Hall problem."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Solution ###\n",
"\n",
"In any problem involving chances, the assumptions about randomness are important. It's reasonable to assume that there is a 1/3 chance that the contestant's initial choice is the door that has the car behind it. \n",
"\n",
"The solution to the problem is quite straightforward under this assumption, though the straightforward solution doesn't convince everyone. Here it is anyway.\n",
"\n",
"- The chance that the car is behind the originally chosen door is 1/3.\n",
"- The car is behind either the originally chosen door or the door that remains. It can't be anywhere else.\n",
"- Therefore, the chance that the car is behind the door that remains is 2/3.\n",
"- Therefore, the contestant should switch.\n",
"\n",
"That's it. End of story. \n",
"\n",
"Not convinced? Then let's simulate the game and see how the results turn out."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Simulation ###\n",
"The simulation will be more complex that those we have done so far. Let's break it down.\n",
"\n",
"### Step 1: What to Simulate ###\n",
"For each play we will simulate what's behind all three doors:\n",
"- the one the contestant first picks\n",
"- the one that Monty opens\n",
"- the remaining door\n",
"\n",
"So we will be keeping track of three quantitites, not just one."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 2: Simulating One Play ###\n",
"The bulk of our work consists of simulating one play of the game. This involves several pieces.\n",
"\n",
"#### The Goats ####\n",
"We start by setting up an array `goats` that contains unimaginative names for the two goats."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"goats = np.array(['first goat', 'second goat'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To help Monty conduct the game, we are going to have to identify which goat is selected and which one is revealed behind the open door. The function `other_goat` takes one goat and returns the other."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"def other_goat(x):\n",
" if x == 'first goat':\n",
" return 'second goat'\n",
" elif x == 'second goat':\n",
" return 'first goat'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's confirm that the function works."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"('second goat', 'first goat', None)"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"other_goat('first goat'), other_goat('second goat'), other_goat('watermelon')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The string `'watermelon'` is not the name of one of the goats, so when `'watermelon'` is the input then `other_goat` does nothing."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The Options ####\n",
"The array `hidden_behind_doors` contains the set of things that could be behind the doors."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"hidden_behind_doors = np.array(['car', 'first goat', 'second goat'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We are now ready to simulate one play. To do this, we will define a function `monty_hall_game` that takes no arguments. When the function is called, it plays Monty's game once and returns a list consisting of:\n",
"\n",
"- the contestant's guess\n",
"- what Monty reveals when he opens a door\n",
"- what remains behind the other door\n",
"\n",
"The game starts with the contestant choosing one door at random. In doing so, the contestant makes a random choice from among the car, the first goat, and the second goat.\n",
"\n",
"If the contestant happens to pick one of the goats, then the other goat is revealed and the car is behind the remaining door.\n",
"\n",
"If the contestant happens to pick the car, then Monty reveals one of the goats and the other goat is behind the remaining door."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"def monty_hall_game():\n",
" \"\"\"Return \n",
" [contestant's guess, what Monty reveals, what remains behind the other door]\"\"\"\n",
" \n",
" contestant_guess = np.random.choice(hidden_behind_doors)\n",
" \n",
" if contestant_guess == 'first goat':\n",
" return [contestant_guess, 'second goat', 'car']\n",
" \n",
" if contestant_guess == 'second goat':\n",
" return [contestant_guess, 'first goat', 'car']\n",
" \n",
" if contestant_guess == 'car':\n",
" revealed = np.random.choice(goats)\n",
" return [contestant_guess, revealed, other_goat(revealed)]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's play! Run the cell several times and see how the results change."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['car', 'first goat', 'second goat']"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"monty_hall_game()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Step 3: Number of Repetitions ###\n",
"To gauge the frequency with which the different results occur, we have to play the game many times and collect the results. Let's run 10,000 repetitions.\n",
"\n",
"### Step 4: Coding the Simulation ###\n",
"It's time to run the whole simulation. \n",
"\n",
"We will play the game 10,000 times and collect the results in a table. Each row of the table will contain the result of one play. \n",
"\n",
"One way to grow a table by adding a new row is to use the `append` method. If `my_table` is a table and `new_row` is a list containing the entries in a new row, then `my_table.append(new_row)` adds the new row to the bottom of `my_table`. \n",
"\n",
"Note that `append` does not create a new table. It changes `my_table` to have one more row than it did before.\n",
"\n",
"First let's create a table `games` that has three empty columns. We can do this by just specifying a list of the column labels, as follows."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Guess
\n",
"
Revealed
\n",
"
Remaining
\n",
"
\n",
" \n",
" \n",
" \n",
"
\n",
"
"
],
"text/plain": [
"Empty DataFrame\n",
"Columns: [Guess, Revealed, Remaining]\n",
"Index: []"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"games = pd.DataFrame(columns=['Guess', 'Revealed', 'Remaining'])\n",
"\n",
"games"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Notice that we have chosen the order of the columns to be the same as the order in which `monty_hall_game` returns the result of one game.\n",
"\n",
"Now we can add 10,000 rows to `trials`. Each row will represent the result of one play of Monty's game."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Guess
\n",
"
Revealed
\n",
"
Remaining
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
second goat
\n",
"
first goat
\n",
"
car
\n",
"
\n",
"
\n",
"
1
\n",
"
car
\n",
"
first goat
\n",
"
second goat
\n",
"
\n",
"
\n",
"
2
\n",
"
first goat
\n",
"
second goat
\n",
"
car
\n",
"
\n",
"
\n",
"
3
\n",
"
car
\n",
"
second goat
\n",
"
first goat
\n",
"
\n",
"
\n",
"
4
\n",
"
second goat
\n",
"
first goat
\n",
"
car
\n",
"
\n",
"
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
...
\n",
"
\n",
"
\n",
"
9995
\n",
"
car
\n",
"
first goat
\n",
"
second goat
\n",
"
\n",
"
\n",
"
9996
\n",
"
second goat
\n",
"
first goat
\n",
"
car
\n",
"
\n",
"
\n",
"
9997
\n",
"
second goat
\n",
"
first goat
\n",
"
car
\n",
"
\n",
"
\n",
"
9998
\n",
"
second goat
\n",
"
first goat
\n",
"
car
\n",
"
\n",
"
\n",
"
9999
\n",
"
car
\n",
"
second goat
\n",
"
first goat
\n",
"
\n",
" \n",
"
\n",
"
10000 rows × 3 columns
\n",
"
"
],
"text/plain": [
" Guess Revealed Remaining\n",
"0 second goat first goat car\n",
"1 car first goat second goat\n",
"2 first goat second goat car\n",
"3 car second goat first goat\n",
"4 second goat first goat car\n",
"... ... ... ...\n",
"9995 car first goat second goat\n",
"9996 second goat first goat car\n",
"9997 second goat first goat car\n",
"9998 second goat first goat car\n",
"9999 car second goat first goat\n",
"\n",
"[10000 rows x 3 columns]"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Play the game 10000 times and \n",
"# record the results in the table games\n",
"\n",
"games = []\n",
"for i in np.arange(10000):\n",
" games.append(monty_hall_game())\n",
" \n",
"games = pd.DataFrame(games)\n",
"\n",
"games = games.rename(columns={0:'Guess', 1:'Revealed', 2:'Remaining'})\n",
"\n",
"games"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The simulation is done. Notice how short the code is. The majority of the work was done in simulating the outcome of one game."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Visualization ###\n",
"\n",
"To see whether the contestant should stick with her original choice or switch, let's see how frequently the car is behind each of her two options."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" Remaining count\n",
"0 car 6605\n",
"1 first goat 1705\n",
"2 second goat 1690"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"remaining_door = games.groupby([\"Remaining\"]).agg(\n",
" count=pd.NamedAgg(column=\"Remaining\", aggfunc=\"count\")\n",
")\n",
"\n",
"remaining_door1 = remaining_door.copy()\n",
"\n",
"remaining_door1.reset_index(inplace=True)\n",
"\n",
"remaining_door2 = remaining_door1.copy()\n",
"\n",
"remaining_door2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As our earlier solution said, the car is behind the remaining door two-thirds of the time, to a pretty good approximation. The contestant is twice as likely to get the car if she switches than if she sticks with her original choice.\n",
"\n",
"To see this graphically, we can join the two tables above and draw overlaid bar charts."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"