Using_Confidence_Intervals.html 30 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671
  1. ---
  2. redirect_from:
  3. - "/chapters/13/4/using-confidence-intervals"
  4. interact_link: content/chapters/13/4/Using_Confidence_Intervals.ipynb
  5. kernel_name: Python [Root]
  6. has_widgets: false
  7. title: |-
  8. Using Confidence Intervals
  9. prev_page:
  10. url: /chapters/13/3/Confidence_Intervals.html
  11. title: |-
  12. Confidence Intervals
  13. next_page:
  14. url: /chapters/14/Why_the_Mean_Matters.html
  15. title: |-
  16. Why the Mean Matters
  17. comment: "***PROGRAMMATICALLY GENERATED, DO NOT EDIT. SEE ORIGINAL FILES IN /content***"
  18. ---
  19. <div class="jb_cell tag_remove_input">
  20. <div class="cell border-box-sizing code_cell rendered">
  21. </div>
  22. </div>
  23. <div class="jb_cell">
  24. <div class="cell border-box-sizing code_cell rendered">
  25. <div class="input">
  26. <div class="inner_cell">
  27. <div class="input_area">
  28. <div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">bootstrap_median</span><span class="p">(</span><span class="n">original_sample</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">replications</span><span class="p">):</span>
  29. <span class="sd">&quot;&quot;&quot;Returns an array of bootstrapped sample medians:</span>
  30. <span class="sd"> original_sample: table containing the original sample</span>
  31. <span class="sd"> label: label of column containing the variable</span>
  32. <span class="sd"> replications: number of bootstrap samples</span>
  33. <span class="sd"> &quot;&quot;&quot;</span>
  34. <span class="n">just_one_column</span> <span class="o">=</span> <span class="n">original_sample</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="n">label</span><span class="p">)</span>
  35. <span class="n">medians</span> <span class="o">=</span> <span class="n">make_array</span><span class="p">()</span>
  36. <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">replications</span><span class="p">):</span>
  37. <span class="n">bootstrap_sample</span> <span class="o">=</span> <span class="n">just_one_column</span><span class="o">.</span><span class="n">sample</span><span class="p">()</span>
  38. <span class="n">resampled_median</span> <span class="o">=</span> <span class="n">percentile</span><span class="p">(</span><span class="mi">50</span><span class="p">,</span> <span class="n">bootstrap_sample</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
  39. <span class="n">medians</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">medians</span><span class="p">,</span> <span class="n">resampled_median</span><span class="p">)</span>
  40. <span class="k">return</span> <span class="n">medians</span>
  41. </pre></div>
  42. </div>
  43. </div>
  44. </div>
  45. </div>
  46. </div>
  47. <div class="jb_cell">
  48. <div class="cell border-box-sizing code_cell rendered">
  49. <div class="input">
  50. <div class="inner_cell">
  51. <div class="input_area">
  52. <div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">bootstrap_mean</span><span class="p">(</span><span class="n">original_sample</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">replications</span><span class="p">):</span>
  53. <span class="sd">&quot;&quot;&quot;Returns an array of bootstrapped sample means:</span>
  54. <span class="sd"> original_sample: table containing the original sample</span>
  55. <span class="sd"> label: label of column containing the variable</span>
  56. <span class="sd"> replications: number of bootstrap samples</span>
  57. <span class="sd"> &quot;&quot;&quot;</span>
  58. <span class="n">just_one_column</span> <span class="o">=</span> <span class="n">original_sample</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="n">label</span><span class="p">)</span>
  59. <span class="n">means</span> <span class="o">=</span> <span class="n">make_array</span><span class="p">()</span>
  60. <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">replications</span><span class="p">):</span>
  61. <span class="n">bootstrap_sample</span> <span class="o">=</span> <span class="n">just_one_column</span><span class="o">.</span><span class="n">sample</span><span class="p">()</span>
  62. <span class="n">resampled_mean</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">bootstrap_sample</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
  63. <span class="n">means</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">means</span><span class="p">,</span> <span class="n">resampled_mean</span><span class="p">)</span>
  64. <span class="k">return</span> <span class="n">means</span>
  65. </pre></div>
  66. </div>
  67. </div>
  68. </div>
  69. </div>
  70. </div>
  71. <div class="jb_cell">
  72. <div class="cell border-box-sizing code_cell rendered">
  73. <div class="input">
  74. <div class="inner_cell">
  75. <div class="input_area">
  76. <div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">bootstrap_proportion</span><span class="p">(</span><span class="n">original_sample</span><span class="p">,</span> <span class="n">label</span><span class="p">,</span> <span class="n">replications</span><span class="p">):</span>
  77. <span class="sd">&quot;&quot;&quot;Returns an array of bootstrapped sample proportions:</span>
  78. <span class="sd"> original_sample: table containing the original sample</span>
  79. <span class="sd"> label: label of column containing the Boolean variable</span>
  80. <span class="sd"> replications: number of bootstrap samples</span>
  81. <span class="sd"> &quot;&quot;&quot;</span>
  82. <span class="n">just_one_column</span> <span class="o">=</span> <span class="n">original_sample</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="n">label</span><span class="p">)</span>
  83. <span class="n">proportions</span> <span class="o">=</span> <span class="n">make_array</span><span class="p">()</span>
  84. <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="n">replications</span><span class="p">):</span>
  85. <span class="n">bootstrap_sample</span> <span class="o">=</span> <span class="n">just_one_column</span><span class="o">.</span><span class="n">sample</span><span class="p">()</span>
  86. <span class="n">resample_array</span> <span class="o">=</span> <span class="n">bootstrap_sample</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span>
  87. <span class="n">resampled_proportion</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">count_nonzero</span><span class="p">(</span><span class="n">resample_array</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">resample_array</span><span class="p">)</span>
  88. <span class="n">proportions</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">proportions</span><span class="p">,</span> <span class="n">resampled_proportion</span><span class="p">)</span>
  89. <span class="k">return</span> <span class="n">proportions</span>
  90. </pre></div>
  91. </div>
  92. </div>
  93. </div>
  94. </div>
  95. </div>
  96. <div class="jb_cell">
  97. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  98. <div class="text_cell_render border-box-sizing rendered_html">
  99. <h3 id="Using-Confidence-Intervals">Using Confidence Intervals<a class="anchor-link" href="#Using-Confidence-Intervals"> </a></h3><p>A confidence interval has a single purpose – to estimate an unknown parameter based on data in a random sample. In the last section, we said that the interval (36%, 42%) was an approximate 95% confidence interval for the percent of smokers among mothers in the population. That was a formal way of saying that by our estimate, the percent of smokers among the mothers in the population was somewhere between 36% and 42%, and that our process of estimation is correct about 95% of the time.</p>
  100. <p>It is important to resist the impulse to use confidence intervals for other purposes. For example, recall that we calculated the interval (26.9 years, 27.6 years) as an approximate 95% confidence interval for the average age of mothers in the population. A dismayingly common misuse of the interval is to conclude that about 95% of the women were between 26.9 years and 27.6 years old. You don't need to know much about confidence intervals to see that this can't be right – you wouldn't expect 95% of mothers to all be within a few months of each other in age. Indeed, the histogram of the sampled ages shows quite a bit of variation.</p>
  101. </div>
  102. </div>
  103. </div>
  104. </div>
  105. <div class="jb_cell">
  106. <div class="cell border-box-sizing code_cell rendered">
  107. <div class="input">
  108. <div class="inner_cell">
  109. <div class="input_area">
  110. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">baby</span> <span class="o">=</span> <span class="n">Table</span><span class="o">.</span><span class="n">read_table</span><span class="p">(</span><span class="n">path_data</span> <span class="o">+</span> <span class="s1">&#39;baby.csv&#39;</span><span class="p">)</span>
  111. </pre></div>
  112. </div>
  113. </div>
  114. </div>
  115. </div>
  116. </div>
  117. <div class="jb_cell">
  118. <div class="cell border-box-sizing code_cell rendered">
  119. <div class="input">
  120. <div class="inner_cell">
  121. <div class="input_area">
  122. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">baby</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s1">&#39;Maternal Age&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">hist</span><span class="p">()</span>
  123. </pre></div>
  124. </div>
  125. </div>
  126. </div>
  127. <div class="output_wrapper">
  128. <div class="output">
  129. <div class="jb_output_wrapper }}">
  130. <div class="output_area">
  131. <div class="output_subarea output_stream output_stderr output_text">
  132. <pre>/home/choldgraf/anaconda/envs/textbook/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6462: UserWarning: The &#39;normed&#39; kwarg is deprecated, and has been replaced by the &#39;density&#39; kwarg.
  133. warnings.warn(&#34;The &#39;normed&#39; kwarg is deprecated, and has been &#34;
  134. </pre>
  135. </div>
  136. </div>
  137. </div>
  138. <div class="jb_output_wrapper }}">
  139. <div class="output_area">
  140. <div class="output_png output_subarea ">
  141. <img src="../../../images/chapters/13/4/Using_Confidence_Intervals_6_1.png"
  142. >
  143. </div>
  144. </div>
  145. </div>
  146. </div>
  147. </div>
  148. </div>
  149. </div>
  150. <div class="jb_cell">
  151. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  152. <div class="text_cell_render border-box-sizing rendered_html">
  153. <p>A small percent of the sampled ages are in the (26.9, 27.6) interval, and you would expect a similar small percent in the population. The interval just estimates one number: the <em>average</em> of all the ages in the population.</p>
  154. </div>
  155. </div>
  156. </div>
  157. </div>
  158. <div class="jb_cell">
  159. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  160. <div class="text_cell_render border-box-sizing rendered_html">
  161. <p>However, estimating a parameter by confidence intervals does have an important use besides just telling us roughly how big the parameter is.</p>
  162. </div>
  163. </div>
  164. </div>
  165. </div>
  166. <div class="jb_cell">
  167. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  168. <div class="text_cell_render border-box-sizing rendered_html">
  169. <h3 id="Using-a-Confidence-Interval-to-Test-Hypotheses">Using a Confidence Interval to Test Hypotheses<a class="anchor-link" href="#Using-a-Confidence-Interval-to-Test-Hypotheses"> </a></h3><p>Our approximate 95% confidence interval for the average age in the population goes from 26.9 years to 27.6 years. Suppose someone wants to test the following hypotheses:</p>
  170. <p><strong>Null hypothesis.</strong> The average age in the population is 30 years.</p>
  171. <p><strong>Alternative hypothesis.</strong> The average age in the population is not 30 years.</p>
  172. <p>Then, if you were using the 5% cutoff for the P-value, you would reject the null hypothesis. This is because 30 is not in the 95% confidence interval for the population average. At the 5% level of significance, 30 is not a plausible value for the population average.</p>
  173. <p>This use of confidence intervals is the result of a <em>duality</em> between confidence intervals and tests: if you are testing whether or not the population mean is a particular value <em>x</em>, and you use the 5% cutoff for the P-value, then you will reject the null hypothesis if <em>x</em> is not in your 95% confidence interval for the mean.</p>
  174. <p>This can be established by statistical theory. In practice, it just boils down to checking whether or not the value specified in the null hypothesis lies in the confidence interval.</p>
  175. <p>If you were using the 1% cutoff for the P-value, you would have to check if the value specified in the null hypothesis lies in a 99% confidence interval for the population mean.</p>
  176. <p>To a rough approximation, these statements are also true for population proportions, provided the sample is large.</p>
  177. </div>
  178. </div>
  179. </div>
  180. </div>
  181. <div class="jb_cell">
  182. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  183. <div class="text_cell_render border-box-sizing rendered_html">
  184. <p>While we now have a way of using confidence intervals to test a particular kind of hypothesis, you might wonder about the value of testing whether or not the average age in a population is equal to 30. Indeed, the value isn't clear. But there are some situations in which a test of this kind of hypothesis is both natural and useful.</p>
  185. </div>
  186. </div>
  187. </div>
  188. </div>
  189. <div class="jb_cell">
  190. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  191. <div class="text_cell_render border-box-sizing rendered_html">
  192. <p>We will study this in the context of data that are a subset of the information gathered in a randomized controlled trial about treatments for Hodgkin's disease. Hodgkin's disease is a cancer that typically affects young people. The disease is curable but the treatment can be very harsh. The purpose of the trial was to come up with dosage that would cure the cancer but minimize the adverse effects on the patients.</p>
  193. <p>This table <code>hodgkins</code> contains data on the effect that the treatment had on the lungs of 22 patients. The columns are:</p>
  194. <ul>
  195. <li>Height in cm</li>
  196. <li>A measure of radiation to the mantle (neck, chest, under arms)</li>
  197. <li>A measure of chemotherapy</li>
  198. <li>A score of the health of the lungs at baseline, that is, at the start of the treatment; higher scores correspond to more healthy lungs</li>
  199. <li>The same score of the health of the lungs, 15 months after treatment</li>
  200. </ul>
  201. </div>
  202. </div>
  203. </div>
  204. </div>
  205. <div class="jb_cell">
  206. <div class="cell border-box-sizing code_cell rendered">
  207. <div class="input">
  208. <div class="inner_cell">
  209. <div class="input_area">
  210. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">hodgkins</span> <span class="o">=</span> <span class="n">Table</span><span class="o">.</span><span class="n">read_table</span><span class="p">(</span><span class="n">path_data</span> <span class="o">+</span> <span class="s1">&#39;hodgkins.csv&#39;</span><span class="p">)</span>
  211. </pre></div>
  212. </div>
  213. </div>
  214. </div>
  215. </div>
  216. </div>
  217. <div class="jb_cell">
  218. <div class="cell border-box-sizing code_cell rendered">
  219. <div class="input">
  220. <div class="inner_cell">
  221. <div class="input_area">
  222. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">hodgkins</span>
  223. </pre></div>
  224. </div>
  225. </div>
  226. </div>
  227. <div class="output_wrapper">
  228. <div class="output">
  229. <div class="jb_output_wrapper }}">
  230. <div class="output_area">
  231. <div class="output_html rendered_html output_subarea output_execute_result">
  232. <table border="1" class="dataframe">
  233. <thead>
  234. <tr>
  235. <th>height</th> <th>rad</th> <th>chemo</th> <th>base</th> <th>month15</th>
  236. </tr>
  237. </thead>
  238. <tbody>
  239. <tr>
  240. <td>164 </td> <td>679 </td> <td>180 </td> <td>160.57</td> <td>87.77 </td>
  241. </tr>
  242. <tr>
  243. <td>168 </td> <td>311 </td> <td>180 </td> <td>98.24 </td> <td>67.62 </td>
  244. </tr>
  245. <tr>
  246. <td>173 </td> <td>388 </td> <td>239 </td> <td>129.04</td> <td>133.33 </td>
  247. </tr>
  248. <tr>
  249. <td>157 </td> <td>370 </td> <td>168 </td> <td>85.41 </td> <td>81.28 </td>
  250. </tr>
  251. <tr>
  252. <td>160 </td> <td>468 </td> <td>151 </td> <td>67.94 </td> <td>79.26 </td>
  253. </tr>
  254. <tr>
  255. <td>170 </td> <td>341 </td> <td>96 </td> <td>150.51</td> <td>80.97 </td>
  256. </tr>
  257. <tr>
  258. <td>163 </td> <td>453 </td> <td>134 </td> <td>129.88</td> <td>69.24 </td>
  259. </tr>
  260. <tr>
  261. <td>175 </td> <td>529 </td> <td>264 </td> <td>87.45 </td> <td>56.48 </td>
  262. </tr>
  263. <tr>
  264. <td>185 </td> <td>392 </td> <td>240 </td> <td>149.84</td> <td>106.99 </td>
  265. </tr>
  266. <tr>
  267. <td>178 </td> <td>479 </td> <td>216 </td> <td>92.24 </td> <td>73.43 </td>
  268. </tr>
  269. </tbody>
  270. </table>
  271. <p>... (12 rows omitted)</p>
  272. </div>
  273. </div>
  274. </div>
  275. </div>
  276. </div>
  277. </div>
  278. </div>
  279. <div class="jb_cell">
  280. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  281. <div class="text_cell_render border-box-sizing rendered_html">
  282. <p>We will compare the baseline and 15-month scores. As each row corresponds to one patient, we say that the sample of baseline scores and the sample of 15-month scores are <em>paired</em> - they are not just two sets of 22 values each, but 22 pairs of values, one for each patient.</p>
  283. <p>At a glance, you can see that the 15-month scores tend to be lower than the baseline scores – the sampled patients' lungs seem to be doing worse 15 months after the treatment. This is confirmed by the mostly positive values in the column <code>drop</code>, the amount by which the score dropped from baseline to 15 months.</p>
  284. </div>
  285. </div>
  286. </div>
  287. </div>
  288. <div class="jb_cell">
  289. <div class="cell border-box-sizing code_cell rendered">
  290. <div class="input">
  291. <div class="inner_cell">
  292. <div class="input_area">
  293. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">hodgkins</span> <span class="o">=</span> <span class="n">hodgkins</span><span class="o">.</span><span class="n">with_column</span><span class="p">(</span>
  294. <span class="s1">&#39;drop&#39;</span><span class="p">,</span> <span class="n">hodgkins</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">&#39;base&#39;</span><span class="p">)</span> <span class="o">-</span> <span class="n">hodgkins</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">&#39;month15&#39;</span><span class="p">)</span>
  295. <span class="p">)</span>
  296. </pre></div>
  297. </div>
  298. </div>
  299. </div>
  300. </div>
  301. </div>
  302. <div class="jb_cell">
  303. <div class="cell border-box-sizing code_cell rendered">
  304. <div class="input">
  305. <div class="inner_cell">
  306. <div class="input_area">
  307. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">hodgkins</span>
  308. </pre></div>
  309. </div>
  310. </div>
  311. </div>
  312. <div class="output_wrapper">
  313. <div class="output">
  314. <div class="jb_output_wrapper }}">
  315. <div class="output_area">
  316. <div class="output_html rendered_html output_subarea output_execute_result">
  317. <table border="1" class="dataframe">
  318. <thead>
  319. <tr>
  320. <th>height</th> <th>rad</th> <th>chemo</th> <th>base</th> <th>month15</th> <th>drop</th>
  321. </tr>
  322. </thead>
  323. <tbody>
  324. <tr>
  325. <td>164 </td> <td>679 </td> <td>180 </td> <td>160.57</td> <td>87.77 </td> <td>72.8 </td>
  326. </tr>
  327. <tr>
  328. <td>168 </td> <td>311 </td> <td>180 </td> <td>98.24 </td> <td>67.62 </td> <td>30.62 </td>
  329. </tr>
  330. <tr>
  331. <td>173 </td> <td>388 </td> <td>239 </td> <td>129.04</td> <td>133.33 </td> <td>-4.29 </td>
  332. </tr>
  333. <tr>
  334. <td>157 </td> <td>370 </td> <td>168 </td> <td>85.41 </td> <td>81.28 </td> <td>4.13 </td>
  335. </tr>
  336. <tr>
  337. <td>160 </td> <td>468 </td> <td>151 </td> <td>67.94 </td> <td>79.26 </td> <td>-11.32</td>
  338. </tr>
  339. <tr>
  340. <td>170 </td> <td>341 </td> <td>96 </td> <td>150.51</td> <td>80.97 </td> <td>69.54 </td>
  341. </tr>
  342. <tr>
  343. <td>163 </td> <td>453 </td> <td>134 </td> <td>129.88</td> <td>69.24 </td> <td>60.64 </td>
  344. </tr>
  345. <tr>
  346. <td>175 </td> <td>529 </td> <td>264 </td> <td>87.45 </td> <td>56.48 </td> <td>30.97 </td>
  347. </tr>
  348. <tr>
  349. <td>185 </td> <td>392 </td> <td>240 </td> <td>149.84</td> <td>106.99 </td> <td>42.85 </td>
  350. </tr>
  351. <tr>
  352. <td>178 </td> <td>479 </td> <td>216 </td> <td>92.24 </td> <td>73.43 </td> <td>18.81 </td>
  353. </tr>
  354. </tbody>
  355. </table>
  356. <p>... (12 rows omitted)</p>
  357. </div>
  358. </div>
  359. </div>
  360. </div>
  361. </div>
  362. </div>
  363. </div>
  364. <div class="jb_cell">
  365. <div class="cell border-box-sizing code_cell rendered">
  366. <div class="input">
  367. <div class="inner_cell">
  368. <div class="input_area">
  369. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">hodgkins</span><span class="o">.</span><span class="n">select</span><span class="p">(</span><span class="s1">&#39;drop&#39;</span><span class="p">)</span><span class="o">.</span><span class="n">hist</span><span class="p">(</span><span class="n">bins</span><span class="o">=</span><span class="n">np</span><span class="o">.</span><span class="n">arange</span><span class="p">(</span><span class="o">-</span><span class="mi">20</span><span class="p">,</span> <span class="mi">81</span><span class="p">,</span> <span class="mi">20</span><span class="p">))</span>
  370. </pre></div>
  371. </div>
  372. </div>
  373. </div>
  374. <div class="output_wrapper">
  375. <div class="output">
  376. <div class="jb_output_wrapper }}">
  377. <div class="output_area">
  378. <div class="output_subarea output_stream output_stderr output_text">
  379. <pre>/home/choldgraf/anaconda/envs/textbook/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6462: UserWarning: The &#39;normed&#39; kwarg is deprecated, and has been replaced by the &#39;density&#39; kwarg.
  380. warnings.warn(&#34;The &#39;normed&#39; kwarg is deprecated, and has been &#34;
  381. </pre>
  382. </div>
  383. </div>
  384. </div>
  385. <div class="jb_output_wrapper }}">
  386. <div class="output_area">
  387. <div class="output_png output_subarea ">
  388. <img src="../../../images/chapters/13/4/Using_Confidence_Intervals_17_1.png"
  389. >
  390. </div>
  391. </div>
  392. </div>
  393. </div>
  394. </div>
  395. </div>
  396. </div>
  397. <div class="jb_cell">
  398. <div class="cell border-box-sizing code_cell rendered">
  399. <div class="input">
  400. <div class="inner_cell">
  401. <div class="input_area">
  402. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">(</span><span class="n">hodgkins</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">&#39;drop&#39;</span><span class="p">))</span>
  403. </pre></div>
  404. </div>
  405. </div>
  406. </div>
  407. <div class="output_wrapper">
  408. <div class="output">
  409. <div class="jb_output_wrapper }}">
  410. <div class="output_area">
  411. <div class="output_text output_subarea output_execute_result">
  412. <pre>28.615909090909096</pre>
  413. </div>
  414. </div>
  415. </div>
  416. </div>
  417. </div>
  418. </div>
  419. </div>
  420. <div class="jb_cell">
  421. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  422. <div class="text_cell_render border-box-sizing rendered_html">
  423. <p>But could this be the result of chance variation? It really doesn't seem so, but the data are from a random sample. Could it be that in the entire population of patients, the average drop is just 0?</p>
  424. <p>To answer this, we can set up two hypotheses:</p>
  425. <p><strong>Null hypothesis.</strong> In the population, the average drop is 0.</p>
  426. <p><strong>Alternative hypothesis.</strong> In the population, the average drop is not 0.</p>
  427. <p>To test this hypothesis with a 1% cutoff for the P-value, let's construct an approximate 99% confidence interval for the average drop in the population.</p>
  428. </div>
  429. </div>
  430. </div>
  431. </div>
  432. <div class="jb_cell">
  433. <div class="cell border-box-sizing code_cell rendered">
  434. <div class="input">
  435. <div class="inner_cell">
  436. <div class="input_area">
  437. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">bstrap_means</span> <span class="o">=</span> <span class="n">bootstrap_mean</span><span class="p">(</span><span class="n">hodgkins</span><span class="p">,</span> <span class="s1">&#39;drop&#39;</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>
  438. <span class="n">left</span> <span class="o">=</span> <span class="n">percentile</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span> <span class="n">bstrap_means</span><span class="p">)</span>
  439. <span class="n">right</span> <span class="o">=</span> <span class="n">percentile</span><span class="p">(</span><span class="mf">99.5</span><span class="p">,</span> <span class="n">bstrap_means</span><span class="p">)</span>
  440. <span class="n">make_array</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">right</span><span class="p">)</span>
  441. </pre></div>
  442. </div>
  443. </div>
  444. </div>
  445. <div class="output_wrapper">
  446. <div class="output">
  447. <div class="jb_output_wrapper }}">
  448. <div class="output_area">
  449. <div class="output_text output_subarea output_execute_result">
  450. <pre>array([17.22636364, 40.54045455])</pre>
  451. </div>
  452. </div>
  453. </div>
  454. </div>
  455. </div>
  456. </div>
  457. </div>
  458. <div class="jb_cell">
  459. <div class="cell border-box-sizing code_cell rendered">
  460. <div class="input">
  461. <div class="inner_cell">
  462. <div class="input_area">
  463. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">resampled_means</span> <span class="o">=</span> <span class="n">Table</span><span class="p">()</span><span class="o">.</span><span class="n">with_column</span><span class="p">(</span>
  464. <span class="s1">&#39;Bootstrap Sample Mean&#39;</span><span class="p">,</span> <span class="n">bstrap_means</span>
  465. <span class="p">)</span>
  466. <span class="n">resampled_means</span><span class="o">.</span><span class="n">hist</span><span class="p">()</span>
  467. <span class="n">plots</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">make_array</span><span class="p">(</span><span class="n">left</span><span class="p">,</span> <span class="n">right</span><span class="p">),</span> <span class="n">make_array</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">),</span> <span class="n">color</span><span class="o">=</span><span class="s1">&#39;yellow&#39;</span><span class="p">,</span> <span class="n">lw</span><span class="o">=</span><span class="mi">8</span><span class="p">);</span>
  468. </pre></div>
  469. </div>
  470. </div>
  471. </div>
  472. <div class="output_wrapper">
  473. <div class="output">
  474. <div class="jb_output_wrapper }}">
  475. <div class="output_area">
  476. <div class="output_subarea output_stream output_stderr output_text">
  477. <pre>/home/choldgraf/anaconda/envs/textbook/lib/python3.6/site-packages/matplotlib/axes/_axes.py:6462: UserWarning: The &#39;normed&#39; kwarg is deprecated, and has been replaced by the &#39;density&#39; kwarg.
  478. warnings.warn(&#34;The &#39;normed&#39; kwarg is deprecated, and has been &#34;
  479. </pre>
  480. </div>
  481. </div>
  482. </div>
  483. <div class="jb_output_wrapper }}">
  484. <div class="output_area">
  485. <div class="output_png output_subarea ">
  486. <img src="../../../images/chapters/13/4/Using_Confidence_Intervals_21_1.png"
  487. >
  488. </div>
  489. </div>
  490. </div>
  491. </div>
  492. </div>
  493. </div>
  494. </div>
  495. <div class="jb_cell">
  496. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  497. <div class="text_cell_render border-box-sizing rendered_html">
  498. <p>The 99% confidence interval for the average drop in the population goes from about 17 to about 40. The interval doesn't contain 0. So we reject the null hypothesis.</p>
  499. <p>But notice that we have done better than simply concluding that the average drop in the population isn't 0. We have estimated how big the average drop is. That's a more useful result than just saying, "It's not 0."</p>
  500. <p><strong>A note on accuracy.</strong> Our confidence interval is quite wide, for two main reasons:</p>
  501. <ul>
  502. <li>The confidence level is high (99%).</li>
  503. <li>The sample size is relatively small compared to those in our earlier examples.</li>
  504. </ul>
  505. <p>In the next chapter, we will examine how the sample size affects accuracy. We will also examine how the empirical distributions of sample means so often come out bell shaped even though the distributions of the underlying data are not bell shaped at all.</p>
  506. </div>
  507. </div>
  508. </div>
  509. </div>
  510. <div class="jb_cell">
  511. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  512. <div class="text_cell_render border-box-sizing rendered_html">
  513. <h3 id="Endnote">Endnote<a class="anchor-link" href="#Endnote"> </a></h3><p>The terminology of a field usually comes from the leading researchers in that field. <a href="https://en.wikipedia.org/wiki/Bradley_Efron">Brad Efron</a>, who first proposed the bootstrap technique, used a term that has <a href="https://en.wikipedia.org/wiki/Bootstrapping">American origins</a>. Not to be outdone, Chinese statisticians have <a href="http://econpapers.repec.org/article/eeestapro/v_3a37_3ay_3a1998_3ai_3a4_3ap_3a321-329.htm">proposed their own method</a>.</p>
  514. </div>
  515. </div>
  516. </div>
  517. </div>