{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": [
"remove_input"
]
},
"outputs": [],
"source": [
"path_data = '../../data/'\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"\n",
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"plt.style.use('fivethirtyeight')\n",
"\n",
"import warnings\n",
"warnings.filterwarnings('ignore')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Decisions and Uncertainty\n",
"We have seen several examples of assessing models that involve chance, by comparing observed data to the predictions made by the models. In all of our examples, there has been no doubt about whether the data were consistent with the model's predictions. The data were either very far away from the predictions, or very close to them.\n",
"\n",
"But outcomes are not always so clear cut. How far is \"far\"? Exactly what does \"close\" mean? While these questions don't have universal answers, there are guidelines and conventions that you can follow. In this section we will describe some of them.\n",
"\n",
"But first let us develop a general framework of decision making, into which all our examples will fit.\n",
"\n",
"What we have developed while assessing models are some of the fundamental concepts of statistical tests of hypotheses. Using statistical tests as a way of making decisions is standard in many fields and has a standard terminology. Here is the sequence of the steps in most statistical tests, along with some terminology and examples. You will see that they are consistent with the sequence of steps we have used for assessing models."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 1: The Hypotheses\n",
"\n",
"All statistical tests attempt to choose between two views of the world. Specifically, the choice is between two views about how the data were generated. These two views are called *hypotheses*.\n",
"\n",
"**The null hypothesis.** This is a clearly defined model about chances. It says that the data were generated at random under clearly specified assumptions about the randomness. The word \"null\" reinforces the idea that if the data look different from what the null hypothesis predicts, the difference is due to *nothing* but chance.\n",
"\n",
"From a practical perspective, **the null hypothesis is a hypothesis under which you can simulate data.**\n",
"\n",
"In the example about Mendel's model for the colors of pea plants, the null hypothesis is that the assumptions of his model are good: each plant has a 75% chance of having purple flowers, independent of all other plants. \n",
"\n",
"Under this hypothesis, we were able to simulate random samples, by using `sample_proportions(929, [0.75, 0.25])`. We used a sample size of 929 because that's the number of plants Mendel grew.\n",
"\n",
"**The alternative hypothesis.** This says that some reason other than chance made the data differ from the predictions of the model in the null hypothesis.\n",
"\n",
"In the example about Mendel's plants, the alternative hypothesis is simply that his model isn't good."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 2: The Test Statistic ###\n",
"\n",
"In order to decide between the two hypothesis, we must choose a statistic that we can use to make the decision. This is called the **test statistic**.\n",
"\n",
"In the example of Mendel's plants, our statistic was the absolute difference between the sample percent and 75% which was predicted by his model.\n",
"\n",
"$$\n",
"\\big{\\vert} \\text{sample percent of purple-flowering plants} - 75 \\big{\\vert}\n",
"$$\n",
"\n",
"To see how to make the choice in general, look at the alternative hypothesis. What values of the statistic will make you think that the alternative hypothesis is a better choice than the null? \n",
"- If the answer is \"big values,\" you might have a good choice of statistic. \n",
"- So also if the answer is \"small values.\" \n",
"- But if the answer is \"both big values and small values,\" we recommend that you look again at your statistic and see if taking an absolute value can change the answer to just \"big values\".\n",
"\n",
"In the case of the pea plants, a sample percent of around 75% will be consistent with the model, but percents much bigger or much less than 75 will make you think that the model isn't good. This indicates that the statistic should be the *distance* between the sample percent and 75, that is, the absolute value of the difference between them. Big values of the distance will make you lean towards the alternative.\n",
"\n",
"The **observed value of the test statistic** is the value of the statistic you get from the data in the study, not a simulated value. Among Mendel's 929 plants, 705 had purple flowers. The observed value of the test statistic was therefore"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.8880516684607045"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"abs ( 100 * (705 / 929) - 75)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 3: The Distribution of the Test Statistic, Under the Null Hypothesis\n",
"\n",
"The main computational aspect of a test of hypotheses is figuring out *what the values of the test statistic might be if the null hypothesis were true*. \n",
"\n",
"The test statistic is simulated based on the assumptions of the model in the null hypothesis. That model involves chance, so the statistic comes out differently when you simulate it multiple times.\n",
"\n",
"By simulating the statistic repeatedly, we get a good sense of its possible values and which ones are more likely than others. In other words, we get a good approximation to the probability distribution of the statistic, as predicted by the model in the null hypothesis.\n",
"\n",
"As with all distributions, it is very useful to visualize this distribution by a histogram. We have done so in all our examples."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Step 4. The Conclusion of the Test\n",
"\n",
"The choice between the null and alternative hypotheses depends on the comparison between what you computed in Steps 2 and 3: the observed value of the test statistic and its distribution as predicted by the null hypothesis. \n",
"\n",
"If the two are consistent with each other, then the observed test statistic is in line with what the null hypothesis predicts. In other words, the test does not point towards the alternative hypothesis; the null hypothesis is better supported by the data. This was the case with the assessment of Mendel's model.\n",
"\n",
"But if the two are not consistent with each other, as is the case in our example about Alameda County jury panels, then the data do not support the null hypothesis. That is why we concluded that the jury panels were not selected at random. Something other than chance affected their composition.\n",
"\n",
"If the data do not support the null hypothesis, we say that the test *rejects* the null hypothesis."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The Meaning of \"Consistent\"\n",
"\n",
"In the example about Alameda County juries, it was apparent that our observed test statistic was far from what was predicted by the null hypothesis. In the example about pea flowers, it is just as clear that the observed statistic is consistent with the distribution that the null predicts. So in both of the examples, it is clear which hypothesis to choose.\n",
"\n",
"But sometimes the decision is not so clear. Whether the observed test statistic is consistent with its predicted distribution under the null hypothesis is a matter of judgment. We recommend that you provide your judgment along with the value of the test statistic and a graph of its predicted distribution under the null. That will allow your reader to make his or her own judgment about whether the two are consistent.\n",
"\n",
"Here is an example where the decision requires judgment."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### The GSI's Defense\n",
"A Berkeley Statistics class of about 350 students was divided into 12 discussion sections led by Graduate Student Instructors (GSIs). After the midterm, students in Section 3 noticed that their scores were on average lower than the rest of the class. \n",
"\n",
"In such situations, students tend to grumble about the section's GSI. Surely, they feel, there must have been something wrong with the GSI's teaching. Or else why would their section have done worse than others?\n",
"\n",
"The GSI, typically more experienced about statistical variation, often has a different perspective: if you simply draw a section of students at random from the whole class, their average score could resemble the score that the students are unhappy about, just by chance.\n",
"\n",
"The GSI's position is a clearly stated chance model. We can simulate data under this model. Let's test it out. \n",
"\n",
"**Null Hypothesis.** The average score of the students in Section 3 is like the average score of the same number of students picked at random from the class. \n",
"\n",
"**Alternative Hypothesis.** No, it's too low.\n",
"\n",
"A natural statistic here is the average of the scores. Low values of the average will make us lean towards the alternative.\n",
"\n",
"Let's take a look at the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The table `scores` contains the section number and midterm score for each student in the class. The midterm scores were integers in the range 0 through 25; 0 means that the student didn't take the test."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
"
],
"text/plain": [
" Midterm\n",
"Section \n",
"1 15.593750\n",
"2 15.125000\n",
"3 13.666667\n",
"4 14.766667\n",
"5 17.454545\n",
"6 15.031250\n",
"7 16.625000\n",
"8 16.310345\n",
"9 14.566667\n",
"10 15.235294\n",
"11 15.807692\n",
"12 15.733333"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"section_averages = scores.groupby(['Section']).mean()\n",
"\n",
"section_averages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The average score of Section 3 is 13.667, which does look low compared to the other section averages. But is it lower than the average of a section of the same size selected at random from the class? \n",
"\n",
"To answer this, we can select a section at random from the class and find its average. To select a section at random to we need to know how big Section 3 is, which we can by once again using `group`."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Midterm
\n",
"
\n",
"
\n",
"
Section
\n",
"
\n",
"
\n",
" \n",
" \n",
"
\n",
"
1
\n",
"
32
\n",
"
\n",
"
\n",
"
2
\n",
"
32
\n",
"
\n",
"
\n",
"
3
\n",
"
27
\n",
"
\n",
"
\n",
"
4
\n",
"
30
\n",
"
\n",
"
\n",
"
5
\n",
"
33
\n",
"
\n",
"
\n",
"
6
\n",
"
32
\n",
"
\n",
"
\n",
"
7
\n",
"
24
\n",
"
\n",
"
\n",
"
8
\n",
"
29
\n",
"
\n",
"
\n",
"
9
\n",
"
30
\n",
"
\n",
"
\n",
"
10
\n",
"
34
\n",
"
\n",
"
\n",
"
11
\n",
"
26
\n",
"
\n",
"
\n",
"
12
\n",
"
30
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Midterm\n",
"Section \n",
"1 32\n",
"2 32\n",
"3 27\n",
"4 30\n",
"5 33\n",
"6 32\n",
"7 24\n",
"8 29\n",
"9 30\n",
"10 34\n",
"11 26\n",
"12 30"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"scores.groupby('Section').count()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Section 3 had 27 students. \n",
"\n",
"Now we can figure out how to create one simulated value of our test statistic, the random sample average.\n",
"\n",
"First we have to select 27 scores at random without replacement. Since the data are already in a table, we will use the Table method `sample`.\n",
"\n",
"Remember that by default, `sample` draws with replacement. The optional argument `with_replacement = False` produces a random sample drawn without replacement."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [],
"source": [
"scores_only = scores.drop(columns=['Section'])"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Midterm
\n",
"
\n",
" \n",
" \n",
"
\n",
"
154
\n",
"
25
\n",
"
\n",
"
\n",
"
344
\n",
"
0
\n",
"
\n",
"
\n",
"
313
\n",
"
16
\n",
"
\n",
"
\n",
"
151
\n",
"
23
\n",
"
\n",
"
\n",
"
122
\n",
"
17
\n",
"
\n",
"
\n",
"
180
\n",
"
18
\n",
"
\n",
"
\n",
"
282
\n",
"
15
\n",
"
\n",
"
\n",
"
281
\n",
"
24
\n",
"
\n",
"
\n",
"
8
\n",
"
8
\n",
"
\n",
"
\n",
"
166
\n",
"
11
\n",
"
\n",
"
\n",
"
236
\n",
"
20
\n",
"
\n",
"
\n",
"
330
\n",
"
8
\n",
"
\n",
"
\n",
"
57
\n",
"
12
\n",
"
\n",
"
\n",
"
295
\n",
"
12
\n",
"
\n",
"
\n",
"
326
\n",
"
21
\n",
"
\n",
"
\n",
"
164
\n",
"
13
\n",
"
\n",
"
\n",
"
272
\n",
"
9
\n",
"
\n",
"
\n",
"
195
\n",
"
13
\n",
"
\n",
"
\n",
"
234
\n",
"
20
\n",
"
\n",
"
\n",
"
49
\n",
"
24
\n",
"
\n",
"
\n",
"
316
\n",
"
17
\n",
"
\n",
"
\n",
"
325
\n",
"
12
\n",
"
\n",
"
\n",
"
132
\n",
"
17
\n",
"
\n",
"
\n",
"
140
\n",
"
23
\n",
"
\n",
"
\n",
"
235
\n",
"
12
\n",
"
\n",
"
\n",
"
277
\n",
"
17
\n",
"
\n",
"
\n",
"
244
\n",
"
10
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Midterm\n",
"154 25\n",
"344 0\n",
"313 16\n",
"151 23\n",
"122 17\n",
"180 18\n",
"282 15\n",
"281 24\n",
"8 8\n",
"166 11\n",
"236 20\n",
"330 8\n",
"57 12\n",
"295 12\n",
"326 21\n",
"164 13\n",
"272 9\n",
"195 13\n",
"234 20\n",
"49 24\n",
"316 17\n",
"325 12\n",
"132 17\n",
"140 23\n",
"235 12\n",
"277 17\n",
"244 10"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"random_sample = scores_only.sample(27, replace=False)\n",
"random_sample"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The average of these 27 randomly selected scores is"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"15.444444444444445"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.average(random_sample['Midterm'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's the average of 27 randomly selected scores. The cell below collects the code necessary for generating this random average. \n",
"\n",
"Now we can simulate the random sample average by repeating the calculation multple times."
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [],
"source": [
"def random_sample_average():\n",
" random_sample = scores_only.sample(27, replace=False)\n",
" return np.average(random_sample['Midterm'])"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([16.85185185, 17.03703704, 16.77777778, ..., 16.77777778,\n",
" 14.66666667, 14.44444444])"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"sample_averages = np.array([])\n",
"\n",
"repetitions = 10000\n",
"for i in np.arange(repetitions):\n",
" sample_averages = np.append(sample_averages, random_sample_average())\n",
" \n",
"sample_averages"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here is the histogram of the simulated averages. It shows the distribution of what the Section 3 average might have been, if Section 3 had been selected at random from the class. \n",
"\n",
"The observed Section 3 average score of 13.667 is shown as a red dot on the horizontal axis. You can ignore the last line of code; it just draws the dot."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"