{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove_input" ] }, "outputs": [], "source": [ "path_data = '../../../../data/'\n", "\n", "import numpy as np\n", "import pandas as pd\n", "from scipy import stats\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('fivethirtyeight')\n", "\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Variability of the Sample Mean ###\n", "By the Central Limit Theorem, the probability distribution of the mean of a large random sample is roughly normal. The bell curve is centered at the population mean. Some of the sample means are higher, and some lower, but the deviations from the population mean are roughly symmetric on either side, as we have seen repeatedly. Formally, probability theory shows that the sample mean is an *unbiased* estimate of the population mean.\n", "\n", "In our simulations, we also noticed that the means of larger samples tend to be more tightly clustered around the population mean than means of smaller samples. In this section, we will quantify the variability of the sample mean and develop a relation between the variability and the sample size." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's start with our table of flight delays. The mean delay is about 16.7 minutes, and the distribution of delays is skewed to the right." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Delay | \n", "
---|---|
0 | \n", "257 | \n", "
1 | \n", "28 | \n", "
2 | \n", "-3 | \n", "
3 | \n", "0 | \n", "
4 | \n", "64 | \n", "
... | \n", "... | \n", "
13820 | \n", "-4 | \n", "
13821 | \n", "8 | \n", "
13822 | \n", "3 | \n", "
13823 | \n", "-1 | \n", "
13824 | \n", "-2 | \n", "
13825 rows × 1 columns
\n", "\n", " | Sample Size n | \n", "SD of 10,000 Sample Means | \n", "pop_sd/sqrt(n) | \n", "
---|---|---|---|
0 | \n", "25 | \n", "7.890499 | \n", "7.896040 | \n", "
1 | \n", "50 | \n", "5.544631 | \n", "5.583343 | \n", "
2 | \n", "75 | \n", "4.521931 | \n", "4.558781 | \n", "
3 | \n", "100 | \n", "3.959873 | \n", "3.948020 | \n", "
4 | \n", "125 | \n", "3.520749 | \n", "3.531216 | \n", "
5 | \n", "150 | \n", "3.238875 | \n", "3.223545 | \n", "
6 | \n", "175 | \n", "2.966418 | \n", "2.984423 | \n", "
7 | \n", "200 | \n", "2.779681 | \n", "2.791672 | \n", "
8 | \n", "225 | \n", "2.613087 | \n", "2.632013 | \n", "
9 | \n", "250 | \n", "2.491289 | \n", "2.496947 | \n", "
10 | \n", "275 | \n", "2.358647 | \n", "2.380746 | \n", "
11 | \n", "300 | \n", "2.285085 | \n", "2.279390 | \n", "
12 | \n", "325 | \n", "2.165918 | \n", "2.189967 | \n", "
13 | \n", "350 | \n", "2.076965 | \n", "2.110305 | \n", "
14 | \n", "375 | \n", "2.031482 | \n", "2.038749 | \n", "
15 | \n", "400 | \n", "1.970609 | \n", "1.974010 | \n", "
16 | \n", "425 | \n", "1.926096 | \n", "1.915071 | \n", "
17 | \n", "450 | \n", "1.878757 | \n", "1.861114 | \n", "
18 | \n", "475 | \n", "1.802092 | \n", "1.811476 | \n", "
19 | \n", "500 | \n", "1.765423 | \n", "1.765608 | \n", "
20 | \n", "525 | \n", "1.713650 | \n", "1.723057 | \n", "
21 | \n", "550 | \n", "1.675659 | \n", "1.683441 | \n", "
22 | \n", "575 | \n", "1.648763 | \n", "1.646438 | \n", "
23 | \n", "600 | \n", "1.619837 | \n", "1.611772 | \n", "
24 | \n", "625 | \n", "1.575032 | \n", "1.579208 | \n", "