Arrays.py 6.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130
  1. #!/usr/bin/env python
  2. # coding: utf-8
  3. # In[1]:
  4. from datascience import *
  5. path_data = '../../../../data/'
  6. # # Arrays
  7. #
  8. # While there are many kinds of collections in Python, we will work primarily with arrays in this class. We've already seen that the `make_array` function can be used to create arrays of numbers.
  9. #
  10. # Arrays can also contain strings or other types of values, but a single array can only contain a single kind of data. (It usually doesn't make sense to group together unlike data anyway.) For example:
  11. # In[2]:
  12. english_parts_of_speech = make_array("noun", "pronoun", "verb", "adverb", "adjective", "conjunction", "preposition", "interjection")
  13. english_parts_of_speech
  14. # Returning to the temperature data, we create arrays of average daily [high temperatures](http://berkeleyearth.lbl.gov/auto/Regional/TMAX/Text/global-land-TMAX-Trend.txt) for the decades surrounding 1850, 1900, 1950, and 2000.
  15. # In[3]:
  16. baseline_high = 14.48
  17. highs = make_array(baseline_high - 0.880,
  18. baseline_high - 0.093,
  19. baseline_high + 0.105,
  20. baseline_high + 0.684)
  21. highs
  22. # Arrays can be used in arithmetic expressions to compute over their contents. When an array is combined with a single number, that number is combined with each element of the array. Therefore, we can convert all of these temperatures to Fahrenheit by writing the familiar conversion formula.
  23. # In[4]:
  24. (9/5) * highs + 32
  25. # <img src="../../../images/array_arithmetic.png" />
  26. # Arrays also have *methods*, which are functions that operate on the array values. The `mean` of a collection of numbers is its average value: the sum divided by the length. Each pair of parentheses in the examples below is part of a call expression; it's calling a function with no arguments to perform a computation on the array called `highs`.
  27. # In[5]:
  28. highs.size
  29. # In[6]:
  30. highs.sum()
  31. # In[7]:
  32. highs.mean()
  33. # #### Functions on Arrays
  34. # The `numpy` package, abbreviated `np` in programs, provides Python programmers with convenient and powerful functions for creating and manipulating arrays.
  35. # In[8]:
  36. import numpy as np
  37. # For example, the `diff` function computes the difference between each adjacent pair of elements in an array. The first element of the `diff` is the second element minus the first.
  38. # In[9]:
  39. np.diff(highs)
  40. # The [full Numpy reference](http://docs.scipy.org/doc/numpy/reference/) lists these functions exhaustively, but only a small subset are used commonly for data processing applications. These are grouped into different packages within `np`. Learning this vocabulary is an important part of learning the Python language, so refer back to this list often as you work through examples and problems.
  41. #
  42. # However, you **don't need to memorize these**. Use this as a reference.
  43. #
  44. # Each of these functions takes an array as an argument and returns a single value.
  45. #
  46. # | **Function** | Description |
  47. # |--------------------|----------------------------------------------------------------------|
  48. # | `np.prod` | Multiply all elements together |
  49. # | `np.sum` | Add all elements together |
  50. # | `np.all` | Test whether all elements are true values (non-zero numbers are true)|
  51. # | `np.any` | Test whether any elements are true values (non-zero numbers are true)|
  52. # | `np.count_nonzero` | Count the number of non-zero elements |
  53. #
  54. # Each of these functions takes an array as an argument and returns an array of values.
  55. #
  56. # | **Function** | Description |
  57. # |--------------------|----------------------------------------------------------------------|
  58. # | `np.diff` | Difference between adjacent elements |
  59. # | `np.round` | Round each number to the nearest integer (whole number) |
  60. # | `np.cumprod` | A cumulative product: for each element, multiply all elements so far |
  61. # | `np.cumsum` | A cumulative sum: for each element, add all elements so far |
  62. # | `np.exp` | Exponentiate each element |
  63. # | `np.log` | Take the natural logarithm of each element |
  64. # | `np.sqrt` | Take the square root of each element |
  65. # | `np.sort` | Sort the elements |
  66. #
  67. # Each of these functions takes an array of strings and returns an array.
  68. #
  69. # | **Function** | **Description** |
  70. # |---------------------|--------------------------------------------------------------|
  71. # | `np.char.lower` | Lowercase each element |
  72. # | `np.char.upper` | Uppercase each element |
  73. # | `np.char.strip` | Remove spaces at the beginning or end of each element |
  74. # | `np.char.isalpha` | Whether each element is only letters (no numbers or symbols) |
  75. # | `np.char.isnumeric` | Whether each element is only numeric (no letters)
  76. #
  77. # Each of these functions takes both an array of strings and a *search string*; each returns an array.
  78. #
  79. # | **Function** | **Description** |
  80. # |----------------------|----------------------------------------------------------------------------------|
  81. # | `np.char.count` | Count the number of times a search string appears among the elements of an array |
  82. # | `np.char.find` | The position within each element that a search string is found first |
  83. # | `np.char.rfind` | The position within each element that a search string is found last |
  84. # | `np.char.startswith` | Whether each element starts with the search string
  85. #
  86. #