Plotting_the_Classics.html 6.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161
  1. ---
  2. redirect_from:
  3. - "/chapters/01/3/plotting-the-classics"
  4. interact_link: content/chapters/01/3/Plotting_the_Classics.ipynb
  5. kernel_name: python3
  6. has_widgets: false
  7. title: |-
  8. Plotting the Classics
  9. prev_page:
  10. url: /chapters/01/2/why-data-science.html
  11. title: |-
  12. Why Data Science?
  13. next_page:
  14. url: /chapters/01/3/1/Literary_Characters.html
  15. title: |-
  16. Literary Characters
  17. comment: "***PROGRAMMATICALLY GENERATED, DO NOT EDIT. SEE ORIGINAL FILES IN /content***"
  18. ---
  19. <div class="jb_cell tag_remove_input">
  20. <div class="cell border-box-sizing code_cell rendered">
  21. </div>
  22. </div>
  23. <div class="jb_cell">
  24. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  25. <div class="text_cell_render border-box-sizing rendered_html">
  26. <h1 id="Plotting-the-classics">Plotting the classics<a class="anchor-link" href="#Plotting-the-classics"> </a></h1><p>In this example, we will explore statistics for two classic novels: <em>The Adventures of Huckleberry Finn</em> by Mark Twain, and <em>Little Women</em> by Louisa May Alcott. The text of any book can be read by a computer at great speed. Books published before 1923 are currently in the <em>public domain</em>, meaning that everyone has the right to copy or use the text in any way. <a href="http://www.gutenberg.org/">Project Gutenberg</a> is a website that publishes public domain books online. Using Python, we can load the text of these books directly from the web.</p>
  27. <p>This example is meant to illustrate some of the broad themes of this text. Don't worry if the details of the program don't yet make sense. Instead, focus on interpreting the images generated below. Later sections of the text will describe most of the features of the Python programming language used below.</p>
  28. <p>First, we read the text of both books into lists of chapters, called <code>huck_finn_chapters</code> and <code>little_women_chapters</code>. In Python, a name cannot contain any spaces, and so we will often use an underscore <code>_</code> to stand in for a space. The <code>=</code> in the lines below give a name on the left to the result of some computation described on the right. A <em>uniform resource locator</em> or <em>URL</em> is an address on the Internet for some content; in this case, the text of a book. The <code>#</code> symbol starts a comment, which is ignored by the computer but helpful for people reading the code.</p>
  29. </div>
  30. </div>
  31. </div>
  32. </div>
  33. <div class="jb_cell">
  34. <div class="cell border-box-sizing code_cell rendered">
  35. <div class="input">
  36. <div class="inner_cell">
  37. <div class="input_area">
  38. <div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Read two books, fast!</span>
  39. <span class="n">huck_finn_url</span> <span class="o">=</span> <span class="s1">&#39;https://www.inferentialthinking.com/data/huck_finn.txt&#39;</span>
  40. <span class="n">huck_finn_text</span> <span class="o">=</span> <span class="n">read_url</span><span class="p">(</span><span class="n">huck_finn_url</span><span class="p">)</span>
  41. <span class="n">huck_finn_chapters</span> <span class="o">=</span> <span class="n">huck_finn_text</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;CHAPTER &#39;</span><span class="p">)[</span><span class="mi">44</span><span class="p">:]</span>
  42. <span class="n">little_women_url</span> <span class="o">=</span> <span class="s1">&#39;https://www.inferentialthinking.com/data/little_women.txt&#39;</span>
  43. <span class="n">little_women_text</span> <span class="o">=</span> <span class="n">read_url</span><span class="p">(</span><span class="n">little_women_url</span><span class="p">)</span>
  44. <span class="n">little_women_chapters</span> <span class="o">=</span> <span class="n">little_women_text</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s1">&#39;CHAPTER &#39;</span><span class="p">)[</span><span class="mi">1</span><span class="p">:]</span>
  45. </pre></div>
  46. </div>
  47. </div>
  48. </div>
  49. </div>
  50. </div>
  51. <div class="jb_cell">
  52. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  53. <div class="text_cell_render border-box-sizing rendered_html">
  54. <p>While a computer cannot understand the text of a book, it can provide us with some insight into the structure of the text. The name <code>huck_finn_chapters</code> is currently bound to a list of all the chapters in the book. We can place them into a table to see how each chapter begins.</p>
  55. </div>
  56. </div>
  57. </div>
  58. </div>
  59. <div class="jb_cell">
  60. <div class="cell border-box-sizing code_cell rendered">
  61. <div class="input">
  62. <div class="inner_cell">
  63. <div class="input_area">
  64. <div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># Display the chapters of Huckleberry Finn in a table.</span>
  65. <span class="n">Table</span><span class="p">()</span><span class="o">.</span><span class="n">with_column</span><span class="p">(</span><span class="s1">&#39;Chapters&#39;</span><span class="p">,</span> <span class="n">huck_finn_chapters</span><span class="p">)</span>
  66. </pre></div>
  67. </div>
  68. </div>
  69. </div>
  70. <div class="output_wrapper">
  71. <div class="output">
  72. <div class="jb_output_wrapper }}">
  73. <div class="output_area">
  74. <div class="output_html rendered_html output_subarea output_execute_result">
  75. <table border="1" class="dataframe">
  76. <thead>
  77. <tr>
  78. <th>Chapters</th>
  79. </tr>
  80. </thead>
  81. <tbody>
  82. <tr>
  83. <td>I. YOU don't know about me without you have read a book ...</td>
  84. </tr>
  85. <tr>
  86. <td>II. WE went tiptoeing along a path amongst the trees bac ...</td>
  87. </tr>
  88. <tr>
  89. <td>III. WELL, I got a good going-over in the morning from o ...</td>
  90. </tr>
  91. <tr>
  92. <td>IV. WELL, three or four months run along, and it was wel ...</td>
  93. </tr>
  94. <tr>
  95. <td>V. I had shut the door to. Then I turned around and ther ...</td>
  96. </tr>
  97. <tr>
  98. <td>VI. WELL, pretty soon the old man was up and around agai ...</td>
  99. </tr>
  100. <tr>
  101. <td>VII. "GIT up! What you 'bout?" I opened my eyes and look ...</td>
  102. </tr>
  103. <tr>
  104. <td>VIII. THE sun was up so high when I waked that I judged ...</td>
  105. </tr>
  106. <tr>
  107. <td>IX. I wanted to go and look at a place right about the m ...</td>
  108. </tr>
  109. <tr>
  110. <td>X. AFTER breakfast I wanted to talk about the dead man a ...</td>
  111. </tr>
  112. </tbody>
  113. </table>
  114. <p>... (33 rows omitted)</p>
  115. </div>
  116. </div>
  117. </div>
  118. </div>
  119. </div>
  120. </div>
  121. </div>
  122. <div class="jb_cell">
  123. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  124. <div class="text_cell_render border-box-sizing rendered_html">
  125. <p>Each chapter begins with a chapter number in Roman numerals, followed by the first sentence of the chapter. Project Gutenberg has printed the first word of each chapter in upper case.</p>
  126. </div>
  127. </div>
  128. </div>
  129. </div>