intro.html 3.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263
  1. ---
  2. interact_link: content/chapters/01/1/intro.md
  3. kernel_name:
  4. has_widgets: false
  5. title: |-
  6. Introduction
  7. prev_page:
  8. url: /chapters/01/what-is-data-science.html
  9. title: |-
  10. Data Science
  11. next_page:
  12. url: /chapters/01/1/1/computational-tools.html
  13. title: |-
  14. Computational Tools
  15. comment: "***PROGRAMMATICALLY GENERATED, DO NOT EDIT. SEE ORIGINAL FILES IN /content***"
  16. ---
  17. <div class="jb_cell">
  18. <div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
  19. <div class="text_cell_render border-box-sizing rendered_html">
  20. <h1 id="Chapter-1:-Introduction">Chapter 1: Introduction<a class="anchor-link" href="#Chapter-1:-Introduction"> </a></h1><p>Data are descriptions of the world around us, collected through observation and
  21. stored on computers. Computers enable us to infer properties of the world from
  22. these descriptions. Data science is the discipline of drawing conclusions from
  23. data using computation. There are three core aspects of effective data
  24. analysis: exploration, prediction, and inference. This text develops a
  25. consistent approach to all three, introducing statistical ideas and fundamental
  26. ideas in computer science concurrently. We focus on a minimal set of core
  27. techniques that can be applied to a vast range of real-world
  28. applications. A foundation in data science requires not only understanding
  29. statistical and computational techniques, but also recognizing how they apply
  30. to real scenarios.</p>
  31. <p>For whatever aspect of the world we wish to study—whether it's the Earth's
  32. weather, the world's markets, political polls, or the human mind—data we
  33. collect typically offer an incomplete description of the subject at hand. A
  34. central challenge of data science is to make reliable conclusions using this
  35. partial information.</p>
  36. <p>In this endeavor, we will combine two essential tools: computation and
  37. randomization. For example, we may want to understand climate change trends
  38. using temperature observations. Computers will allow us to use all available
  39. information to draw conclusions. Rather than focusing only on the average
  40. temperature of a region, we will consider the whole range of temperatures
  41. together to construct a more nuanced analysis. Randomness will allow us to
  42. consider the many different ways in which incomplete information might be
  43. completed. Rather than assuming that temperatures vary in a particular way, we
  44. will learn to use randomness as a way to imagine many possible scenarios that
  45. are all consistent with the data we observe.</p>
  46. <p>Applying this approach requires learning to program a computer, and so this
  47. text interleaves a complete introduction to programming that assumes no prior
  48. knowledge. Readers with programming experience will find that we cover several
  49. topics in computation that do not appear in a typical introductory computer
  50. science curriculum. Data science also requires careful reasoning about numerical
  51. quantities, but this text does not assume any background in mathematics or
  52. statistics beyond basic algebra. You will find very few equations in this text.
  53. Instead, techniques are described to readers in the same language in which they
  54. are described to the computers that execute them—a programming language.</p>
  55. </div>
  56. </div>
  57. </div>
  58. </div>