Method_of_Least_Squares.html 37 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118
  1. ---
  2. redirect_from:
  3. - "/chapters/15/3/method-of-least-squares"
  4. interact_link: content/chapters/15/3/Method_of_Least_Squares.ipynb
  5. kernel_name: python3
  6. has_widgets: false
  7. title: |-
  8. The Method of Least Squares
  9. prev_page:
  10. url: /chapters/15/2/Regression_Line.html
  11. title: |-
  12. The Regression Line
  13. next_page:
  14. url: /chapters/15/4/Least_Squares_Regression.html
  15. title: |-
  16. Least Squares Regression
  17. comment: "***PROGRAMMATICALLY GENERATED, DO NOT EDIT. SEE ORIGINAL FILES IN /content***"
  18. ---
  19. <div class="jb_cell tag_remove_input">
  20. <div class="cell border-box-sizing code_cell rendered">
  21. </div>
  22. </div>
  23. <div class="jb_cell tag_remove_input">
  24. <div class="cell border-box-sizing code_cell rendered">
  25. </div>
  26. </div>
  27. <div class="jb_cell">
  28. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  29. <div class="text_cell_render border-box-sizing rendered_html">
  30. <h3 id="The-Method-of-Least-Squares">The Method of Least Squares<a class="anchor-link" href="#The-Method-of-Least-Squares"> </a></h3><p>We have retraced the steps that Galton and Pearson took to develop the equation of the regression line that runs through a football shaped scatter plot. But not all scatter plots are football shaped, not even linear ones. Does every scatter plot have a "best" line that goes through it? If so, can we still use the formulas for the slope and intercept developed in the previous section, or do we need new ones?</p>
  31. <p>To address these questions, we need a reasonable definition of "best". Recall that the purpose of the line is to <em>predict</em> or <em>estimate</em> values of $y$, given values of $x$. Estimates typically aren't perfect. Each one is off the true value by an <em>error</em>. A reasonable criterion for a line to be the "best" is for it to have the smallest possible overall error among all straight lines.</p>
  32. <p>In this section we will make this criterion precise and see if we can identify the best straight line under the criterion.</p>
  33. </div>
  34. </div>
  35. </div>
  36. </div>
  37. <div class="jb_cell">
  38. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  39. <div class="text_cell_render border-box-sizing rendered_html">
  40. <p>Our first example is a dataset that has one row for every chapter of the novel "Little Women." The goal is to estimate the number of characters (that is, letters, spaces punctuation marks, and so on) based on the number of periods. Recall that we attempted to do this in the very first lecture of this course.</p>
  41. </div>
  42. </div>
  43. </div>
  44. </div>
  45. <div class="jb_cell">
  46. <div class="cell border-box-sizing code_cell rendered">
  47. <div class="input">
  48. <div class="inner_cell">
  49. <div class="input_area">
  50. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">little_women</span> <span class="o">=</span> <span class="n">Table</span><span class="o">.</span><span class="n">read_table</span><span class="p">(</span><span class="n">path_data</span> <span class="o">+</span> <span class="s1">&#39;little_women.csv&#39;</span><span class="p">)</span>
  51. <span class="n">little_women</span> <span class="o">=</span> <span class="n">little_women</span><span class="o">.</span><span class="n">move_to_start</span><span class="p">(</span><span class="s1">&#39;Periods&#39;</span><span class="p">)</span>
  52. <span class="n">little_women</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="mi">3</span><span class="p">)</span>
  53. </pre></div>
  54. </div>
  55. </div>
  56. </div>
  57. <div class="output_wrapper">
  58. <div class="output">
  59. <div class="jb_output_wrapper }}">
  60. <div class="output_area">
  61. <div class="output_html rendered_html output_subarea ">
  62. <table border="1" class="dataframe">
  63. <thead>
  64. <tr>
  65. <th>Periods</th> <th>Characters</th>
  66. </tr>
  67. </thead>
  68. <tbody>
  69. <tr>
  70. <td>189 </td> <td>21759 </td>
  71. </tr>
  72. <tr>
  73. <td>188 </td> <td>22148 </td>
  74. </tr>
  75. <tr>
  76. <td>231 </td> <td>20558 </td>
  77. </tr>
  78. </tbody>
  79. </table>
  80. <p>... (44 rows omitted)</p>
  81. </div>
  82. </div>
  83. </div>
  84. </div>
  85. </div>
  86. </div>
  87. </div>
  88. <div class="jb_cell">
  89. <div class="cell border-box-sizing code_cell rendered">
  90. <div class="input">
  91. <div class="inner_cell">
  92. <div class="input_area">
  93. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">little_women</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="s1">&#39;Periods&#39;</span><span class="p">,</span> <span class="s1">&#39;Characters&#39;</span><span class="p">)</span>
  94. </pre></div>
  95. </div>
  96. </div>
  97. </div>
  98. <div class="output_wrapper">
  99. <div class="output">
  100. <div class="jb_output_wrapper }}">
  101. <div class="output_area">
  102. <div class="output_png output_subarea ">
  103. <img src="../../../images/chapters/15/3/Method_of_Least_Squares_5_0.png"
  104. >
  105. </div>
  106. </div>
  107. </div>
  108. </div>
  109. </div>
  110. </div>
  111. </div>
  112. <div class="jb_cell">
  113. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  114. <div class="text_cell_render border-box-sizing rendered_html">
  115. <p>To explore the data, we will need to use the functions <code>correlation</code>, <code>slope</code>, <code>intercept</code>, and <code>fit</code> defined in the previous section.</p>
  116. </div>
  117. </div>
  118. </div>
  119. </div>
  120. <div class="jb_cell">
  121. <div class="cell border-box-sizing code_cell rendered">
  122. <div class="input">
  123. <div class="inner_cell">
  124. <div class="input_area">
  125. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">correlation</span><span class="p">(</span><span class="n">little_women</span><span class="p">,</span> <span class="s1">&#39;Periods&#39;</span><span class="p">,</span> <span class="s1">&#39;Characters&#39;</span><span class="p">)</span>
  126. </pre></div>
  127. </div>
  128. </div>
  129. </div>
  130. <div class="output_wrapper">
  131. <div class="output">
  132. <div class="jb_output_wrapper }}">
  133. <div class="output_area">
  134. <div class="output_text output_subarea output_execute_result">
  135. <pre>0.9229576895854816</pre>
  136. </div>
  137. </div>
  138. </div>
  139. </div>
  140. </div>
  141. </div>
  142. </div>
  143. <div class="jb_cell">
  144. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  145. <div class="text_cell_render border-box-sizing rendered_html">
  146. <p>The scatter plot is remarkably close to linear, and the correlation is more than 0.92.</p>
  147. </div>
  148. </div>
  149. </div>
  150. </div>
  151. <div class="jb_cell">
  152. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  153. <div class="text_cell_render border-box-sizing rendered_html">
  154. <h3 id="Error-in-Estimation">Error in Estimation<a class="anchor-link" href="#Error-in-Estimation"> </a></h3><p>The graph below shows the scatter plot and line that we developed in the previous section. We don't yet know if that's the best among all lines. We first have to say precisely what "best" means.</p>
  155. </div>
  156. </div>
  157. </div>
  158. </div>
  159. <div class="jb_cell">
  160. <div class="cell border-box-sizing code_cell rendered">
  161. <div class="input">
  162. <div class="inner_cell">
  163. <div class="input_area">
  164. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">lw_with_predictions</span> <span class="o">=</span> <span class="n">little_women</span><span class="o">.</span><span class="n">with_column</span><span class="p">(</span><span class="s1">&#39;Linear Prediction&#39;</span><span class="p">,</span> <span class="n">fit</span><span class="p">(</span><span class="n">little_women</span><span class="p">,</span> <span class="s1">&#39;Periods&#39;</span><span class="p">,</span> <span class="s1">&#39;Characters&#39;</span><span class="p">))</span>
  165. <span class="n">lw_with_predictions</span><span class="o">.</span><span class="n">scatter</span><span class="p">(</span><span class="s1">&#39;Periods&#39;</span><span class="p">)</span>
  166. </pre></div>
  167. </div>
  168. </div>
  169. </div>
  170. <div class="output_wrapper">
  171. <div class="output">
  172. <div class="jb_output_wrapper }}">
  173. <div class="output_area">
  174. <div class="output_png output_subarea ">
  175. <img src="../../../images/chapters/15/3/Method_of_Least_Squares_10_0.png"
  176. >
  177. </div>
  178. </div>
  179. </div>
  180. </div>
  181. </div>
  182. </div>
  183. </div>
  184. <div class="jb_cell">
  185. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  186. <div class="text_cell_render border-box-sizing rendered_html">
  187. <p>Corresponding to each point on the scatter plot, there is an error of prediction calculated as the actual value minus the predicted value. It is the vertical distance between the point and the line, with a negative sign if the point is below the line.</p>
  188. </div>
  189. </div>
  190. </div>
  191. </div>
  192. <div class="jb_cell">
  193. <div class="cell border-box-sizing code_cell rendered">
  194. <div class="input">
  195. <div class="inner_cell">
  196. <div class="input_area">
  197. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">actual</span> <span class="o">=</span> <span class="n">lw_with_predictions</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">&#39;Characters&#39;</span><span class="p">)</span>
  198. <span class="n">predicted</span> <span class="o">=</span> <span class="n">lw_with_predictions</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">&#39;Linear Prediction&#39;</span><span class="p">)</span>
  199. <span class="n">errors</span> <span class="o">=</span> <span class="n">actual</span> <span class="o">-</span> <span class="n">predicted</span>
  200. </pre></div>
  201. </div>
  202. </div>
  203. </div>
  204. </div>
  205. </div>
  206. <div class="jb_cell">
  207. <div class="cell border-box-sizing code_cell rendered">
  208. <div class="input">
  209. <div class="inner_cell">
  210. <div class="input_area">
  211. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">lw_with_predictions</span><span class="o">.</span><span class="n">with_column</span><span class="p">(</span><span class="s1">&#39;Error&#39;</span><span class="p">,</span> <span class="n">errors</span><span class="p">)</span>
  212. </pre></div>
  213. </div>
  214. </div>
  215. </div>
  216. <div class="output_wrapper">
  217. <div class="output">
  218. <div class="jb_output_wrapper }}">
  219. <div class="output_area">
  220. <div class="output_html rendered_html output_subarea output_execute_result">
  221. <table border="1" class="dataframe">
  222. <thead>
  223. <tr>
  224. <th>Periods</th> <th>Characters</th> <th>Linear Prediction</th> <th>Error</th>
  225. </tr>
  226. </thead>
  227. <tbody>
  228. <tr>
  229. <td>189 </td> <td>21759 </td> <td>21183.6 </td> <td>575.403 </td>
  230. </tr>
  231. <tr>
  232. <td>188 </td> <td>22148 </td> <td>21096.6 </td> <td>1051.38 </td>
  233. </tr>
  234. <tr>
  235. <td>231 </td> <td>20558 </td> <td>24836.7 </td> <td>-4278.67</td>
  236. </tr>
  237. <tr>
  238. <td>195 </td> <td>25526 </td> <td>21705.5 </td> <td>3820.54 </td>
  239. </tr>
  240. <tr>
  241. <td>255 </td> <td>23395 </td> <td>26924.1 </td> <td>-3529.13</td>
  242. </tr>
  243. <tr>
  244. <td>140 </td> <td>14622 </td> <td>16921.7 </td> <td>-2299.68</td>
  245. </tr>
  246. <tr>
  247. <td>131 </td> <td>14431 </td> <td>16138.9 </td> <td>-1707.88</td>
  248. </tr>
  249. <tr>
  250. <td>214 </td> <td>22476 </td> <td>23358 </td> <td>-882.043</td>
  251. </tr>
  252. <tr>
  253. <td>337 </td> <td>33767 </td> <td>34056.3 </td> <td>-289.317</td>
  254. </tr>
  255. <tr>
  256. <td>185 </td> <td>18508 </td> <td>20835.7 </td> <td>-2327.69</td>
  257. </tr>
  258. </tbody>
  259. </table>
  260. <p>... (37 rows omitted)</p>
  261. </div>
  262. </div>
  263. </div>
  264. </div>
  265. </div>
  266. </div>
  267. </div>
  268. <div class="jb_cell">
  269. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  270. <div class="text_cell_render border-box-sizing rendered_html">
  271. <p>We can use <code>slope</code> and <code>intercept</code> to calculate the slope and intercept of the fitted line. The graph below shows the line (in light blue). The errors corresponding to four of the points are shown in red. There is nothing special about those four points. They were just chosen for clarity of the display. The function <code>lw_errors</code> takes a slope and an intercept (in that order) as its arguments and draws the figure.</p>
  272. </div>
  273. </div>
  274. </div>
  275. </div>
  276. <div class="jb_cell">
  277. <div class="cell border-box-sizing code_cell rendered">
  278. <div class="input">
  279. <div class="inner_cell">
  280. <div class="input_area">
  281. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">lw_reg_slope</span> <span class="o">=</span> <span class="n">slope</span><span class="p">(</span><span class="n">little_women</span><span class="p">,</span> <span class="s1">&#39;Periods&#39;</span><span class="p">,</span> <span class="s1">&#39;Characters&#39;</span><span class="p">)</span>
  282. <span class="n">lw_reg_intercept</span> <span class="o">=</span> <span class="n">intercept</span><span class="p">(</span><span class="n">little_women</span><span class="p">,</span> <span class="s1">&#39;Periods&#39;</span><span class="p">,</span> <span class="s1">&#39;Characters&#39;</span><span class="p">)</span>
  283. </pre></div>
  284. </div>
  285. </div>
  286. </div>
  287. </div>
  288. </div>
  289. <div class="jb_cell tag_remove_input">
  290. <div class="cell border-box-sizing code_cell rendered">
  291. </div>
  292. </div>
  293. <div class="jb_cell">
  294. <div class="cell border-box-sizing code_cell rendered">
  295. <div class="input">
  296. <div class="inner_cell">
  297. <div class="input_area">
  298. <div class=" highlight hl-ipython3"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s1">&#39;Slope of Regression Line: &#39;</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">round</span><span class="p">(</span><span class="n">lw_reg_slope</span><span class="p">),</span> <span class="s1">&#39;characters per period&#39;</span><span class="p">)</span>
  299. <span class="nb">print</span><span class="p">(</span><span class="s1">&#39;Intercept of Regression Line:&#39;</span><span class="p">,</span> <span class="n">np</span><span class="o">.</span><span class="n">round</span><span class="p">(</span><span class="n">lw_reg_intercept</span><span class="p">),</span> <span class="s1">&#39;characters&#39;</span><span class="p">)</span>
  300. <span class="n">lw_errors</span><span class="p">(</span><span class="n">lw_reg_slope</span><span class="p">,</span> <span class="n">lw_reg_intercept</span><span class="p">)</span>
  301. </pre></div>
  302. </div>
  303. </div>
  304. </div>
  305. <div class="output_wrapper">
  306. <div class="output">
  307. <div class="jb_output_wrapper }}">
  308. <div class="output_area">
  309. <div class="output_subarea output_stream output_stdout output_text">
  310. <pre>Slope of Regression Line: 87.0 characters per period
  311. Intercept of Regression Line: 4745.0 characters
  312. </pre>
  313. </div>
  314. </div>
  315. </div>
  316. <div class="jb_output_wrapper }}">
  317. <div class="output_area">
  318. <div class="output_png output_subarea ">
  319. <img src="../../../images/chapters/15/3/Method_of_Least_Squares_17_1.png"
  320. >
  321. </div>
  322. </div>
  323. </div>
  324. </div>
  325. </div>
  326. </div>
  327. </div>
  328. <div class="jb_cell">
  329. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  330. <div class="text_cell_render border-box-sizing rendered_html">
  331. <p>Had we used a different line to create our estimates, the errors would have been different. The graph below shows how big the errors would be if we were to use another line for estimation. The second graph shows large errors obtained by using a line that is downright silly.</p>
  332. </div>
  333. </div>
  334. </div>
  335. </div>
  336. <div class="jb_cell">
  337. <div class="cell border-box-sizing code_cell rendered">
  338. <div class="input">
  339. <div class="inner_cell">
  340. <div class="input_area">
  341. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">lw_errors</span><span class="p">(</span><span class="mi">50</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>
  342. </pre></div>
  343. </div>
  344. </div>
  345. </div>
  346. <div class="output_wrapper">
  347. <div class="output">
  348. <div class="jb_output_wrapper }}">
  349. <div class="output_area">
  350. <div class="output_png output_subarea ">
  351. <img src="../../../images/chapters/15/3/Method_of_Least_Squares_19_0.png"
  352. >
  353. </div>
  354. </div>
  355. </div>
  356. </div>
  357. </div>
  358. </div>
  359. </div>
  360. <div class="jb_cell">
  361. <div class="cell border-box-sizing code_cell rendered">
  362. <div class="input">
  363. <div class="inner_cell">
  364. <div class="input_area">
  365. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">lw_errors</span><span class="p">(</span><span class="o">-</span><span class="mi">100</span><span class="p">,</span> <span class="mi">50000</span><span class="p">)</span>
  366. </pre></div>
  367. </div>
  368. </div>
  369. </div>
  370. <div class="output_wrapper">
  371. <div class="output">
  372. <div class="jb_output_wrapper }}">
  373. <div class="output_area">
  374. <div class="output_png output_subarea ">
  375. <img src="../../../images/chapters/15/3/Method_of_Least_Squares_20_0.png"
  376. >
  377. </div>
  378. </div>
  379. </div>
  380. </div>
  381. </div>
  382. </div>
  383. </div>
  384. <div class="jb_cell">
  385. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  386. <div class="text_cell_render border-box-sizing rendered_html">
  387. <h3 id="Root-Mean-Squared-Error">Root Mean Squared Error<a class="anchor-link" href="#Root-Mean-Squared-Error"> </a></h3><p>What we need now is one overall measure of the rough size of the errors. You will recognize the approach to creating this – it's exactly the way we developed the SD.</p>
  388. <p>If you use any arbitrary line to calculate your estimates, then some of your errors are likely to be positive and others negative. To avoid cancellation when measuring the rough size of the errors, we will take the mean of the squared errors rather than the mean of the errors themselves.</p>
  389. <p>The mean squared error of estimation is a measure of roughly how big the squared errors are, but as we have noted earlier, its units are hard to interpret. Taking the square root yields the root mean square error (rmse), which is in the same units as the variable being predicted and therefore much easier to understand.</p>
  390. </div>
  391. </div>
  392. </div>
  393. </div>
  394. <div class="jb_cell">
  395. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  396. <div class="text_cell_render border-box-sizing rendered_html">
  397. <h3 id="Minimizing-the-Root-Mean-Squared-Error">Minimizing the Root Mean Squared Error<a class="anchor-link" href="#Minimizing-the-Root-Mean-Squared-Error"> </a></h3><p>Our observations so far can be summarized as follows.</p>
  398. <ul>
  399. <li>To get estimates of $y$ based on $x$, you can use any line you want.</li>
  400. <li>Every line has a root mean squared error of estimation.</li>
  401. <li>"Better" lines have smaller errors.</li>
  402. </ul>
  403. <p>Is there a "best" line? That is, is there a line that minimizes the root mean squared error among all lines?</p>
  404. <p>To answer this question, we will start by defining a function <code>lw_rmse</code> to compute the root mean squared error of any line through the Little Women scatter diagram. The function takes the slope and the intercept (in that order) as its arguments.</p>
  405. </div>
  406. </div>
  407. </div>
  408. </div>
  409. <div class="jb_cell">
  410. <div class="cell border-box-sizing code_cell rendered">
  411. <div class="input">
  412. <div class="inner_cell">
  413. <div class="input_area">
  414. <div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">lw_rmse</span><span class="p">(</span><span class="n">slope</span><span class="p">,</span> <span class="n">intercept</span><span class="p">):</span>
  415. <span class="n">lw_errors</span><span class="p">(</span><span class="n">slope</span><span class="p">,</span> <span class="n">intercept</span><span class="p">)</span>
  416. <span class="n">x</span> <span class="o">=</span> <span class="n">little_women</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">&#39;Periods&#39;</span><span class="p">)</span>
  417. <span class="n">y</span> <span class="o">=</span> <span class="n">little_women</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">&#39;Characters&#39;</span><span class="p">)</span>
  418. <span class="n">fitted</span> <span class="o">=</span> <span class="n">slope</span> <span class="o">*</span> <span class="n">x</span> <span class="o">+</span> <span class="n">intercept</span>
  419. <span class="n">mse</span> <span class="o">=</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">((</span><span class="n">y</span> <span class="o">-</span> <span class="n">fitted</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span>
  420. <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Root mean squared error:&quot;</span><span class="p">,</span> <span class="n">mse</span> <span class="o">**</span> <span class="mf">0.5</span><span class="p">)</span>
  421. </pre></div>
  422. </div>
  423. </div>
  424. </div>
  425. </div>
  426. </div>
  427. <div class="jb_cell">
  428. <div class="cell border-box-sizing code_cell rendered">
  429. <div class="input">
  430. <div class="inner_cell">
  431. <div class="input_area">
  432. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">lw_rmse</span><span class="p">(</span><span class="mi">50</span><span class="p">,</span> <span class="mi">10000</span><span class="p">)</span>
  433. </pre></div>
  434. </div>
  435. </div>
  436. </div>
  437. <div class="output_wrapper">
  438. <div class="output">
  439. <div class="jb_output_wrapper }}">
  440. <div class="output_area">
  441. <div class="output_subarea output_stream output_stdout output_text">
  442. <pre>Root mean squared error: 4322.167831766537
  443. </pre>
  444. </div>
  445. </div>
  446. </div>
  447. <div class="jb_output_wrapper }}">
  448. <div class="output_area">
  449. <div class="output_png output_subarea ">
  450. <img src="../../../images/chapters/15/3/Method_of_Least_Squares_24_1.png"
  451. >
  452. </div>
  453. </div>
  454. </div>
  455. </div>
  456. </div>
  457. </div>
  458. </div>
  459. <div class="jb_cell">
  460. <div class="cell border-box-sizing code_cell rendered">
  461. <div class="input">
  462. <div class="inner_cell">
  463. <div class="input_area">
  464. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">lw_rmse</span><span class="p">(</span><span class="o">-</span><span class="mi">100</span><span class="p">,</span> <span class="mi">50000</span><span class="p">)</span>
  465. </pre></div>
  466. </div>
  467. </div>
  468. </div>
  469. <div class="output_wrapper">
  470. <div class="output">
  471. <div class="jb_output_wrapper }}">
  472. <div class="output_area">
  473. <div class="output_subarea output_stream output_stdout output_text">
  474. <pre>Root mean squared error: 16710.11983735375
  475. </pre>
  476. </div>
  477. </div>
  478. </div>
  479. <div class="jb_output_wrapper }}">
  480. <div class="output_area">
  481. <div class="output_png output_subarea ">
  482. <img src="../../../images/chapters/15/3/Method_of_Least_Squares_25_1.png"
  483. >
  484. </div>
  485. </div>
  486. </div>
  487. </div>
  488. </div>
  489. </div>
  490. </div>
  491. <div class="jb_cell">
  492. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  493. <div class="text_cell_render border-box-sizing rendered_html">
  494. <p>Bad lines have big values of rmse, as expected. But the rmse is much smaller if we choose a slope and intercept close to those of the regression line.</p>
  495. </div>
  496. </div>
  497. </div>
  498. </div>
  499. <div class="jb_cell">
  500. <div class="cell border-box-sizing code_cell rendered">
  501. <div class="input">
  502. <div class="inner_cell">
  503. <div class="input_area">
  504. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">lw_rmse</span><span class="p">(</span><span class="mi">90</span><span class="p">,</span> <span class="mi">4000</span><span class="p">)</span>
  505. </pre></div>
  506. </div>
  507. </div>
  508. </div>
  509. <div class="output_wrapper">
  510. <div class="output">
  511. <div class="jb_output_wrapper }}">
  512. <div class="output_area">
  513. <div class="output_subarea output_stream output_stdout output_text">
  514. <pre>Root mean squared error: 2715.5391063834586
  515. </pre>
  516. </div>
  517. </div>
  518. </div>
  519. <div class="jb_output_wrapper }}">
  520. <div class="output_area">
  521. <div class="output_png output_subarea ">
  522. <img src="../../../images/chapters/15/3/Method_of_Least_Squares_27_1.png"
  523. >
  524. </div>
  525. </div>
  526. </div>
  527. </div>
  528. </div>
  529. </div>
  530. </div>
  531. <div class="jb_cell">
  532. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  533. <div class="text_cell_render border-box-sizing rendered_html">
  534. <p>Here is the root mean squared error corresponding to the regression line. By a remarkable fact of mathematics, no other line can beat this one.</p>
  535. <ul>
  536. <li><strong>The regression line is the unique straight line that minimizes the mean squared error of estimation among all straight lines.</strong></li>
  537. </ul>
  538. </div>
  539. </div>
  540. </div>
  541. </div>
  542. <div class="jb_cell">
  543. <div class="cell border-box-sizing code_cell rendered">
  544. <div class="input">
  545. <div class="inner_cell">
  546. <div class="input_area">
  547. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">lw_rmse</span><span class="p">(</span><span class="n">lw_reg_slope</span><span class="p">,</span> <span class="n">lw_reg_intercept</span><span class="p">)</span>
  548. </pre></div>
  549. </div>
  550. </div>
  551. </div>
  552. <div class="output_wrapper">
  553. <div class="output">
  554. <div class="jb_output_wrapper }}">
  555. <div class="output_area">
  556. <div class="output_subarea output_stream output_stdout output_text">
  557. <pre>Root mean squared error: 2701.690785311856
  558. </pre>
  559. </div>
  560. </div>
  561. </div>
  562. <div class="jb_output_wrapper }}">
  563. <div class="output_area">
  564. <div class="output_png output_subarea ">
  565. <img src="../../../images/chapters/15/3/Method_of_Least_Squares_29_1.png"
  566. >
  567. </div>
  568. </div>
  569. </div>
  570. </div>
  571. </div>
  572. </div>
  573. </div>
  574. <div class="jb_cell">
  575. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  576. <div class="text_cell_render border-box-sizing rendered_html">
  577. <p>The proof of this statement requires abstract mathematics that is beyond the scope of this course. On the other hand, we do have a powerful tool – Python – that performs large numerical computations with ease. So we can use Python to confirm that the regression line minimizes the mean squared error.</p>
  578. </div>
  579. </div>
  580. </div>
  581. </div>
  582. <div class="jb_cell">
  583. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  584. <div class="text_cell_render border-box-sizing rendered_html">
  585. <h3 id="Numerical-Optimization">Numerical Optimization<a class="anchor-link" href="#Numerical-Optimization"> </a></h3><p>First note that a line that minimizes the root mean squared error is also a line that minimizes the squared error. The square root makes no difference to the minimization. So we will save ourselves a step of computation and just minimize the mean squared error (mse).</p>
  586. <p>We are trying to predict the number of characters ($y$) based on the number of periods ($x$) in chapters of Little Women. If we use the line
  587. $$
  588. \mbox{prediction} ~=~ ax + b
  589. $$
  590. it will have an mse that depends on the slope $a$ and the intercept $b$. The function <code>lw_mse</code> takes the slope and intercept as its arguments and returns the corresponding mse.</p>
  591. </div>
  592. </div>
  593. </div>
  594. </div>
  595. <div class="jb_cell">
  596. <div class="cell border-box-sizing code_cell rendered">
  597. <div class="input">
  598. <div class="inner_cell">
  599. <div class="input_area">
  600. <div class=" highlight hl-ipython3"><pre><span></span><span class="k">def</span> <span class="nf">lw_mse</span><span class="p">(</span><span class="n">any_slope</span><span class="p">,</span> <span class="n">any_intercept</span><span class="p">):</span>
  601. <span class="n">x</span> <span class="o">=</span> <span class="n">little_women</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">&#39;Periods&#39;</span><span class="p">)</span>
  602. <span class="n">y</span> <span class="o">=</span> <span class="n">little_women</span><span class="o">.</span><span class="n">column</span><span class="p">(</span><span class="s1">&#39;Characters&#39;</span><span class="p">)</span>
  603. <span class="n">fitted</span> <span class="o">=</span> <span class="n">any_slope</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="n">any_intercept</span>
  604. <span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">mean</span><span class="p">((</span><span class="n">y</span> <span class="o">-</span> <span class="n">fitted</span><span class="p">)</span> <span class="o">**</span> <span class="mi">2</span><span class="p">)</span>
  605. </pre></div>
  606. </div>
  607. </div>
  608. </div>
  609. </div>
  610. </div>
  611. <div class="jb_cell">
  612. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  613. <div class="text_cell_render border-box-sizing rendered_html">
  614. <p>Let's check that <code>lw_mse</code> gets the right answer for the root mean squared error of the regression line. Remember that <code>lw_mse</code> returns the mean squared error, so we have to take the square root to get the rmse.</p>
  615. </div>
  616. </div>
  617. </div>
  618. </div>
  619. <div class="jb_cell">
  620. <div class="cell border-box-sizing code_cell rendered">
  621. <div class="input">
  622. <div class="inner_cell">
  623. <div class="input_area">
  624. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">lw_mse</span><span class="p">(</span><span class="n">lw_reg_slope</span><span class="p">,</span> <span class="n">lw_reg_intercept</span><span class="p">)</span><span class="o">**</span><span class="mf">0.5</span>
  625. </pre></div>
  626. </div>
  627. </div>
  628. </div>
  629. <div class="output_wrapper">
  630. <div class="output">
  631. <div class="jb_output_wrapper }}">
  632. <div class="output_area">
  633. <div class="output_text output_subarea output_execute_result">
  634. <pre>2701.690785311856</pre>
  635. </div>
  636. </div>
  637. </div>
  638. </div>
  639. </div>
  640. </div>
  641. </div>
  642. <div class="jb_cell">
  643. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  644. <div class="text_cell_render border-box-sizing rendered_html">
  645. <p>That's the same as the value we got by using <code>lw_rmse</code> earlier:</p>
  646. </div>
  647. </div>
  648. </div>
  649. </div>
  650. <div class="jb_cell">
  651. <div class="cell border-box-sizing code_cell rendered">
  652. <div class="input">
  653. <div class="inner_cell">
  654. <div class="input_area">
  655. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">lw_rmse</span><span class="p">(</span><span class="n">lw_reg_slope</span><span class="p">,</span> <span class="n">lw_reg_intercept</span><span class="p">)</span>
  656. </pre></div>
  657. </div>
  658. </div>
  659. </div>
  660. <div class="output_wrapper">
  661. <div class="output">
  662. <div class="jb_output_wrapper }}">
  663. <div class="output_area">
  664. <div class="output_subarea output_stream output_stdout output_text">
  665. <pre>Root mean squared error: 2701.690785311856
  666. </pre>
  667. </div>
  668. </div>
  669. </div>
  670. <div class="jb_output_wrapper }}">
  671. <div class="output_area">
  672. <div class="output_png output_subarea ">
  673. <img src="../../../images/chapters/15/3/Method_of_Least_Squares_36_1.png"
  674. >
  675. </div>
  676. </div>
  677. </div>
  678. </div>
  679. </div>
  680. </div>
  681. </div>
  682. <div class="jb_cell">
  683. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  684. <div class="text_cell_render border-box-sizing rendered_html">
  685. <p>You can confirm that <code>lw_mse</code> returns the correct value for other slopes and intercepts too. For example, here is the rmse of the extremely bad line that we tried earlier.</p>
  686. </div>
  687. </div>
  688. </div>
  689. </div>
  690. <div class="jb_cell">
  691. <div class="cell border-box-sizing code_cell rendered">
  692. <div class="input">
  693. <div class="inner_cell">
  694. <div class="input_area">
  695. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">lw_mse</span><span class="p">(</span><span class="o">-</span><span class="mi">100</span><span class="p">,</span> <span class="mi">50000</span><span class="p">)</span><span class="o">**</span><span class="mf">0.5</span>
  696. </pre></div>
  697. </div>
  698. </div>
  699. </div>
  700. <div class="output_wrapper">
  701. <div class="output">
  702. <div class="jb_output_wrapper }}">
  703. <div class="output_area">
  704. <div class="output_text output_subarea output_execute_result">
  705. <pre>16710.11983735375</pre>
  706. </div>
  707. </div>
  708. </div>
  709. </div>
  710. </div>
  711. </div>
  712. </div>
  713. <div class="jb_cell">
  714. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  715. <div class="text_cell_render border-box-sizing rendered_html">
  716. <p>And here is the rmse for a line that is close to the regression line.</p>
  717. </div>
  718. </div>
  719. </div>
  720. </div>
  721. <div class="jb_cell">
  722. <div class="cell border-box-sizing code_cell rendered">
  723. <div class="input">
  724. <div class="inner_cell">
  725. <div class="input_area">
  726. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">lw_mse</span><span class="p">(</span><span class="mi">90</span><span class="p">,</span> <span class="mi">4000</span><span class="p">)</span><span class="o">**</span><span class="mf">0.5</span>
  727. </pre></div>
  728. </div>
  729. </div>
  730. </div>
  731. <div class="output_wrapper">
  732. <div class="output">
  733. <div class="jb_output_wrapper }}">
  734. <div class="output_area">
  735. <div class="output_text output_subarea output_execute_result">
  736. <pre>2715.5391063834586</pre>
  737. </div>
  738. </div>
  739. </div>
  740. </div>
  741. </div>
  742. </div>
  743. </div>
  744. <div class="jb_cell">
  745. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  746. <div class="text_cell_render border-box-sizing rendered_html">
  747. <p>If we experiment with different values, we can find a low-error slope and intercept through trial and error, but that would take a while. Fortunately, there is a Python function that does all the trial and error for us.</p>
  748. <p>The <code>minimize</code> function can be used to find the arguments of a function for which the function returns its minimum value. Python uses a similar trial-and-error approach, following the changes that lead to incrementally lower output values.</p>
  749. <p>The argument of <code>minimize</code> is a function that itself takes numerical arguments and returns a numerical value. For example, the function <code>lw_mse</code> takes a numerical slope and intercept as its arguments and returns the corresponding mse.</p>
  750. <p>The call <code>minimize(lw_mse)</code> returns an array consisting of the slope and the intercept that minimize the mse. These minimizing values are excellent approximations arrived at by intelligent trial-and-error, not exact values based on formulas.</p>
  751. </div>
  752. </div>
  753. </div>
  754. </div>
  755. <div class="jb_cell">
  756. <div class="cell border-box-sizing code_cell rendered">
  757. <div class="input">
  758. <div class="inner_cell">
  759. <div class="input_area">
  760. <div class=" highlight hl-ipython3"><pre><span></span><span class="n">best</span> <span class="o">=</span> <span class="n">minimize</span><span class="p">(</span><span class="n">lw_mse</span><span class="p">)</span>
  761. <span class="n">best</span>
  762. </pre></div>
  763. </div>
  764. </div>
  765. </div>
  766. <div class="output_wrapper">
  767. <div class="output">
  768. <div class="jb_output_wrapper }}">
  769. <div class="output_area">
  770. <div class="output_text output_subarea output_execute_result">
  771. <pre>array([ 86.97784117, 4744.78484535])</pre>
  772. </div>
  773. </div>
  774. </div>
  775. </div>
  776. </div>
  777. </div>
  778. </div>
  779. <div class="jb_cell">
  780. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  781. <div class="text_cell_render border-box-sizing rendered_html">
  782. <p>These values are the same as the values we calculated earlier by using the <code>slope</code> and <code>intercept</code> functions. We see small deviations due to the inexact nature of <code>minimize</code>, but the values are essentially the same.</p>
  783. </div>
  784. </div>
  785. </div>
  786. </div>
  787. <div class="jb_cell">
  788. <div class="cell border-box-sizing code_cell rendered">
  789. <div class="input">
  790. <div class="inner_cell">
  791. <div class="input_area">
  792. <div class=" highlight hl-ipython3"><pre><span></span><span class="nb">print</span><span class="p">(</span><span class="s2">&quot;slope from formula: &quot;</span><span class="p">,</span> <span class="n">lw_reg_slope</span><span class="p">)</span>
  793. <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;slope from minimize: &quot;</span><span class="p">,</span> <span class="n">best</span><span class="o">.</span><span class="n">item</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
  794. <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;intercept from formula: &quot;</span><span class="p">,</span> <span class="n">lw_reg_intercept</span><span class="p">)</span>
  795. <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;intercept from minimize: &quot;</span><span class="p">,</span> <span class="n">best</span><span class="o">.</span><span class="n">item</span><span class="p">(</span><span class="mi">1</span><span class="p">))</span>
  796. </pre></div>
  797. </div>
  798. </div>
  799. </div>
  800. <div class="output_wrapper">
  801. <div class="output">
  802. <div class="jb_output_wrapper }}">
  803. <div class="output_area">
  804. <div class="output_subarea output_stream output_stdout output_text">
  805. <pre>slope from formula: 86.97784125829821
  806. slope from minimize: 86.97784116615884
  807. intercept from formula: 4744.784796574928
  808. intercept from minimize: 4744.784845352655
  809. </pre>
  810. </div>
  811. </div>
  812. </div>
  813. </div>
  814. </div>
  815. </div>
  816. </div>
  817. <div class="jb_cell">
  818. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  819. <div class="text_cell_render border-box-sizing rendered_html">
  820. <h3 id="The-Least-Squares-Line">The Least Squares Line<a class="anchor-link" href="#The-Least-Squares-Line"> </a></h3><p>Therefore, we have found not only that the regression line minimizes mean squared error, but also that minimizing mean squared error gives us the regression line. The regression line is the only line that minimizes mean squared error.</p>
  821. <p>That is why the regression line is sometimes called the "least squares line."</p>
  822. </div>
  823. </div>
  824. </div>
  825. </div>