{ "cells": [ { "cell_type": "code", "execution_count": 22, "metadata": { "tags": [ "remove_input" ] }, "outputs": [], "source": [ "path_data = '../../data/'\n", "\n", "import numpy as np\n", "import pandas as pd\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('fivethirtyeight')\n", "\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Visualizing Numerical Distributions\n", "\n", "Many of the variables that data scientists study are *quantitative* or *numerical*. Their values are numbers on which you can perform arithmetic. Examples that we have seen include the number of periods in chapters of a book, the amount of money made by movies, and the age of people in the United States.\n", "\n", "The values of a categorical variable can be given numerical codes, but that doesn't make the variable quantitative. In the example in which we studied Census data broken down by age group, the categorial variable `SEX` had the numerical codes `1` for 'Male,' `2` for 'Female,' and `0` for the aggregate of both groups `1` and `2`. While 0, 1, and 2 are numbers, in this context it doesn't make sense to subtract 1 from 2, or take the average of 0, 1, and 2, or perform other arithmetic on the three values. `SEX` is a categorical variable even though the values have been given a numerical code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For our main example, we will return to a dataset that we studied when we were visualizing categorical data. It is the table `top`, which consists of data from U.S.A.'s top grossing movies of all time. For convenience, here is the description of the table again.\n", "\n", "The first column contains the title of the movie. The second column contains the name of the studio that produced the movie. The third contains the domestic box office gross in dollars, and the fourth contains the gross amount that would have been earned from ticket sales at 2016 prices. The fifth contains the release year of the movie. \n", "\n", "There are 200 movies on the list. Here are the top ten according to the unadjusted gross receipts in the column `Gross`." ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleStudioGrossGross (Adjusted)Year
0Star Wars: The Force AwakensBuena Vista (Disney)9067234189067234002015
1AvatarFox7605076258461208002009
2TitanicParamount65867230211786279001997
3Jurassic WorldUniversal6522706256877280002015
4Marvel's The AvengersBuena Vista (Disney)6233579106688666002012
..................
195The Caine MutinyColumbia217500003861735001954
196The Bells of St. Mary'sRKO213333335458824001945
197Duel in the SunSelz.204081634438775001946
198Sergeant YorkWarner Bros.163618854186718001941
199The Four Horsemen of the ApocalypseMPC91836733994898001921
\n", "

200 rows × 5 columns

\n", "
" ], "text/plain": [ " Title Studio Gross \\\n", "0 Star Wars: The Force Awakens Buena Vista (Disney) 906723418 \n", "1 Avatar Fox 760507625 \n", "2 Titanic Paramount 658672302 \n", "3 Jurassic World Universal 652270625 \n", "4 Marvel's The Avengers Buena Vista (Disney) 623357910 \n", ".. ... ... ... \n", "195 The Caine Mutiny Columbia 21750000 \n", "196 The Bells of St. Mary's RKO 21333333 \n", "197 Duel in the Sun Selz. 20408163 \n", "198 Sergeant York Warner Bros. 16361885 \n", "199 The Four Horsemen of the Apocalypse MPC 9183673 \n", "\n", " Gross (Adjusted) Year \n", "0 906723400 2015 \n", "1 846120800 2009 \n", "2 1178627900 1997 \n", "3 687728000 2015 \n", "4 668866600 2012 \n", ".. ... ... \n", "195 386173500 1954 \n", "196 545882400 1945 \n", "197 443877500 1946 \n", "198 418671800 1941 \n", "199 399489800 1921 \n", "\n", "[200 rows x 5 columns]" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "top = pd.read_csv(path_data + 'top_movies.csv')\n", "\n", "top" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleStudioGrossGross (Adjusted)Year
0Star Wars: The Force AwakensBuena Vista (Disney)906,723,418906,723,4002015
1AvatarFox760,507,625846,120,8002009
2TitanicParamount658,672,3021,178,627,9001997
3Jurassic WorldUniversal652,270,625687,728,0002015
4Marvel's The AvengersBuena Vista (Disney)623,357,910668,866,6002012
..................
195The Caine MutinyColumbia21,750,000386,173,5001954
196The Bells of St. Mary'sRKO21,333,333545,882,4001945
197Duel in the SunSelz.20,408,163443,877,5001946
198Sergeant YorkWarner Bros.16,361,885418,671,8001941
199The Four Horsemen of the ApocalypseMPC9,183,673399,489,8001921
\n", "

200 rows × 5 columns

\n", "
" ], "text/plain": [ " Title Studio Gross \\\n", "0 Star Wars: The Force Awakens Buena Vista (Disney) 906,723,418 \n", "1 Avatar Fox 760,507,625 \n", "2 Titanic Paramount 658,672,302 \n", "3 Jurassic World Universal 652,270,625 \n", "4 Marvel's The Avengers Buena Vista (Disney) 623,357,910 \n", ".. ... ... ... \n", "195 The Caine Mutiny Columbia 21,750,000 \n", "196 The Bells of St. Mary's RKO 21,333,333 \n", "197 Duel in the Sun Selz. 20,408,163 \n", "198 Sergeant York Warner Bros. 16,361,885 \n", "199 The Four Horsemen of the Apocalypse MPC 9,183,673 \n", "\n", " Gross (Adjusted) Year \n", "0 906,723,400 2015 \n", "1 846,120,800 2009 \n", "2 1,178,627,900 1997 \n", "3 687,728,000 2015 \n", "4 668,866,600 2012 \n", ".. ... ... \n", "195 386,173,500 1954 \n", "196 545,882,400 1945 \n", "197 443,877,500 1946 \n", "198 418,671,800 1941 \n", "199 399,489,800 1921 \n", "\n", "[200 rows x 5 columns]" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Make the numbers in the Gross and Gross (Adjusted) columns look nicer:\n", "# Separate '000s with comma\n", "# When using an original data set it is often good practice to work on a copy of the original\n", "\n", "top1 = top.copy()\n", "\n", "top1['Gross'] = top1['Gross'].apply('{:,}'.format)\n", "top1['Gross (Adjusted)'] = top1['Gross (Adjusted)'].apply('{:,}'.format)\n", "\n", "top1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing the Distribution of the Adjusted Receipts\n", "\n", "In this section we will draw graphs of the distribution of the numerical variable in the column `Gross (Adjusted)`. For simplicity, let's create a smaller table that has the information that we need. And since three-digit numbers are easier to work with than nine-digit numbers, let's measure the `Adjusted Gross` receipts in millions of dollars. Note how `round` is used to retain only two decimal places." ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
TitleAdjusted Gross
0Star Wars: The Force Awakens906.72
1Avatar846.12
2Titanic1178.63
3Jurassic World687.73
4Marvel's The Avengers668.87
.........
195The Caine Mutiny386.17
196The Bells of St. Mary's545.88
197Duel in the Sun443.88
198Sergeant York418.67
199The Four Horsemen of the Apocalypse399.49
\n", "

200 rows × 2 columns

\n", "
" ], "text/plain": [ " Title Adjusted Gross\n", "0 Star Wars: The Force Awakens 906.72\n", "1 Avatar 846.12\n", "2 Titanic 1178.63\n", "3 Jurassic World 687.73\n", "4 Marvel's The Avengers 668.87\n", ".. ... ...\n", "195 The Caine Mutiny 386.17\n", "196 The Bells of St. Mary's 545.88\n", "197 Duel in the Sun 443.88\n", "198 Sergeant York 418.67\n", "199 The Four Horsemen of the Apocalypse 399.49\n", "\n", "[200 rows x 2 columns]" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "millions = pd.DataFrame({'Title': top['Title'], 'Adjusted Gross': np.round((top['Gross (Adjusted)']/1e6), 2)})\n", "\n", "millions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### A Histogram\n", "A *histogram* of a numerical dataset looks very much like a bar chart, though it has some important differences that we will examine in this section. First, let's just draw a histogram of the adjusted receipts.\n", "\n", "The `hist` method generates a histogram of the values in a column. The histogram shows the distribution of the adjusted gross amounts, in millions of 2016 dollars. " ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "unit = 'Million Dollars'\n", "\n", "fig, ax1 = plt.subplots()\n", "\n", "ax1.hist(millions['Adjusted Gross'], 10, density=True, ec='white')\n", "\n", "y_vals = ax1.get_yticks()\n", "\n", "y_label = 'Percent per ' + (unit if unit else 'unit')\n", "\n", "x_label = 'Adjusted Gross (' + (unit if unit else 'unit') + ')'\n", "\n", "ax1.set_yticklabels(['{:g}'.format(x * 100) for x in y_vals])\n", "\n", "plt.ylabel(y_label)\n", "\n", "plt.xlabel(x_label)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Horizontal Axis\n", "\n", "The amounts have been grouped into contiguous intervals called *bins*. Although in this dataset no movie grossed an amount that is exactly on the edge between two bins, `hist` does have to account for situations where there might have been values at the edges. So `hist` has an *endpoint convention*: bins include the data at their left endpoint, but not the data at their right endpoint. \n", "\n", "We will use the notation **[*a*, *b*)** for the bin that starts at *a* and ends at *b* but doesn't include *b*.\n", "\n", "Sometimes, adjustments have to be made in the first or last bin, to ensure that the smallest and largest values of the variable are included. You saw an example of such an adjustment in the Census data studied earlier, where an age of \"100\" years actually meant \"100 years old or older.\"\n", "\n", "We can see that there are 10 bins (some bars are so low that they are hard to see), and that they all have the same width. We can also see that none of the movies grossed fewer than 300 million dollars; that is because we are considering only the top grossing movies of all time. \n", "\n", "It is a little harder to see exactly where the ends of the bins are situated. For example, it is not easy to pinpoint exactly where the value 500 lies on the horizontal axis. So it is hard to judge exactly where one bar ends and the next begins.\n", "\n", "The optional argument `bins` can be used with `hist` to specify the endpoints of the bins. It must consist of a sequence of numbers that starts with the left end of the first bin and ends with the right end of the last bin. We will start by setting the numbers in `bins` to be 300, 400, 500, and so on, ending with 2000. \n", "\n", "**Note** defining bin values the y-axis scale has changed to 0.00 - 0.04" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "unit = 'Million Dollars'\n", "\n", "fig, ax1 = plt.subplots()\n", "\n", "ax1.hist(millions['Adjusted Gross'], bins=np.arange(300,2001,100), density=True, ec='white')\n", "\n", "y_vals = ax1.get_yticks()\n", "\n", "y_label = 'Percent per ' + (unit if unit else 'unit')\n", "\n", "x_label = 'Adjusted Gross (' + (unit if unit else 'unit') + ')'\n", "\n", "ax1.set_yticklabels(['{:g}'.format(x * 100) for x in y_vals])\n", "\n", "plt.ylabel(y_label)\n", "\n", "plt.xlabel(x_label)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The horizontal axis of this figure is easier to read. The labels 200, 400, 600, and so on are centered at the corresponding values. The tallest bar is for movies that grossed between 300 million and 400 million dollars. \n", "\n", "A very small number of movies grossed 800 million dollars or more. This results in the figure being \"skewed to the right,\" or, less formally, having \"a long right hand tail.\" Distributions of variables like income or rent in large populations also often have this kind of shape." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Counts in the Bins\n", "\n", "The counts of values in the bins can be computed from a table using the `.value_counts()` method, which takes a column label or index and an optional sequence or number of bins. The result is a tabular form of a histogram. The first column lists the bin ranges (but see the note about the final value, below). The second column contains the counts of all values in the `Adjusted Gross` column that are in the corresponding bin. That is, it counts all the `Adjusted Gross` values that are greater than or equal to the value in `bin`, but less than the next value in `bin`." ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Adjusted Gross
(299.999, 400.0]81
(400.0, 500.0]52
(500.0, 600.0]28
(600.0, 700.0]16
(700.0, 800.0]7
(800.0, 900.0]5
(900.0, 1000.0]3
(1100.0, 1200.0]3
(1200.0, 1300.0]2
(1000.0, 1100.0]1
(1500.0, 1600.0]1
(1700.0, 1800.0]1
(1800.0, 1900.0]0
(1300.0, 1400.0]0
(1400.0, 1500.0]0
(1600.0, 1700.0]0
(1900.0, 2000.0]0
\n", "
" ], "text/plain": [ " Adjusted Gross\n", "(299.999, 400.0] 81\n", "(400.0, 500.0] 52\n", "(500.0, 600.0] 28\n", "(600.0, 700.0] 16\n", "(700.0, 800.0] 7\n", "(800.0, 900.0] 5\n", "(900.0, 1000.0] 3\n", "(1100.0, 1200.0] 3\n", "(1200.0, 1300.0] 2\n", "(1000.0, 1100.0] 1\n", "(1500.0, 1600.0] 1\n", "(1700.0, 1800.0] 1\n", "(1800.0, 1900.0] 0\n", "(1300.0, 1400.0] 0\n", "(1400.0, 1500.0] 0\n", "(1600.0, 1700.0] 0\n", "(1900.0, 2000.0] 0" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bin_counts = millions['Adjusted Gross']\n", "\n", "bin_counts = pd.DataFrame(bin_counts.value_counts(bins=(np.arange(300,2001,100))))\n", "\n", "bin_counts" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Adjusted Gross
(299.999, 400.0]0.405
(400.0, 500.0]0.260
(500.0, 600.0]0.140
(600.0, 700.0]0.080
(700.0, 800.0]0.035
(800.0, 900.0]0.025
(900.0, 1000.0]0.015
(1100.0, 1200.0]0.015
(1200.0, 1300.0]0.010
(1000.0, 1100.0]0.005
(1500.0, 1600.0]0.005
(1700.0, 1800.0]0.005
(1800.0, 1900.0]0.000
(1300.0, 1400.0]0.000
(1400.0, 1500.0]0.000
(1600.0, 1700.0]0.000
(1900.0, 2000.0]0.000
\n", "
" ], "text/plain": [ " Adjusted Gross\n", "(299.999, 400.0] 0.405\n", "(400.0, 500.0] 0.260\n", "(500.0, 600.0] 0.140\n", "(600.0, 700.0] 0.080\n", "(700.0, 800.0] 0.035\n", "(800.0, 900.0] 0.025\n", "(900.0, 1000.0] 0.015\n", "(1100.0, 1200.0] 0.015\n", "(1200.0, 1300.0] 0.010\n", "(1000.0, 1100.0] 0.005\n", "(1500.0, 1600.0] 0.005\n", "(1700.0, 1800.0] 0.005\n", "(1800.0, 1900.0] 0.000\n", "(1300.0, 1400.0] 0.000\n", "(1400.0, 1500.0] 0.000\n", "(1600.0, 1700.0] 0.000\n", "(1900.0, 2000.0] 0.000" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bin_counts_norm = millions['Adjusted Gross']\n", "\n", "bin_counts_norm = pd.DataFrame(bin_counts_norm.value_counts(normalize=True,bins=(np.arange(300,2001,100))))\n", "\n", "bin_counts_norm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Splitting data\n", "in the next step we take the index of 'bin_counts_norm', reset the index so that we create an 'intervals' column which can then be split into 'left' and 'right' elements." ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
intervalsleftright
0(299.999, 400.0]299.999400.0
1(400.0, 500.0]400.000500.0
2(500.0, 600.0]500.000600.0
3(600.0, 700.0]600.000700.0
4(700.0, 800.0]700.000800.0
5(800.0, 900.0]800.000900.0
6(900.0, 1000.0]900.0001000.0
7(1100.0, 1200.0]1100.0001200.0
8(1200.0, 1300.0]1200.0001300.0
9(1000.0, 1100.0]1000.0001100.0
10(1500.0, 1600.0]1500.0001600.0
11(1700.0, 1800.0]1700.0001800.0
12(1800.0, 1900.0]1800.0001900.0
13(1300.0, 1400.0]1300.0001400.0
14(1400.0, 1500.0]1400.0001500.0
15(1600.0, 1700.0]1600.0001700.0
16(1900.0, 2000.0]1900.0002000.0
\n", "
" ], "text/plain": [ " intervals left right\n", "0 (299.999, 400.0] 299.999 400.0\n", "1 (400.0, 500.0] 400.000 500.0\n", "2 (500.0, 600.0] 500.000 600.0\n", "3 (600.0, 700.0] 600.000 700.0\n", "4 (700.0, 800.0] 700.000 800.0\n", "5 (800.0, 900.0] 800.000 900.0\n", "6 (900.0, 1000.0] 900.000 1000.0\n", "7 (1100.0, 1200.0] 1100.000 1200.0\n", "8 (1200.0, 1300.0] 1200.000 1300.0\n", "9 (1000.0, 1100.0] 1000.000 1100.0\n", "10 (1500.0, 1600.0] 1500.000 1600.0\n", "11 (1700.0, 1800.0] 1700.000 1800.0\n", "12 (1800.0, 1900.0] 1800.000 1900.0\n", "13 (1300.0, 1400.0] 1300.000 1400.0\n", "14 (1400.0, 1500.0] 1400.000 1500.0\n", "15 (1600.0, 1700.0] 1600.000 1700.0\n", "16 (1900.0, 2000.0] 1900.000 2000.0" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "interval_split = pd.DataFrame({'intervals': bin_counts_norm.index})\n", "\n", "interval_split['left'] = interval_split['intervals'].array.left\n", "\n", "interval_split['right'] = interval_split['intervals'].array.right\n", "\n", "interval_split" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[300,\n", " 400,\n", " 500,\n", " 600,\n", " 700,\n", " 800,\n", " 900,\n", " 1100,\n", " 1200,\n", " 1000,\n", " 1500,\n", " 1700,\n", " 1800,\n", " 1300,\n", " 1400,\n", " 1600,\n", " 1900]" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lower_limit = np.round(interval_split['left'],0)\n", "\n", "lower_limit = lower_limit.astype(int)\n", "\n", "lower_limit = list(lower_limit)\n", "\n", "lower_limit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice the `bin` value 2000 in the last row. That's not the left end-point of any bar – it's the right end point of the last bar. By the endpoint convention, the data there are not included. So the corresponding `count` is recorded as 0, and would have been recorded as 0 even if there had been movies that made more than \\$2,000$ million dollars. When either `bin` or `hist` is called with a `bins` argument, the graph only considers values that are in the specified bins.\n", "\n", "Once values have been binned, the resulting counts can be used to generate a **bar chart** using df `lower_limit` *typed* as a list. " ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcAAAAEfCAYAAADWTRaJAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAA+3ElEQVR4nO3deVxN+f8H8Nd1SRuyJNos6asimrH1tQwyyhoiVEwiJevwtWQwxjYxMQ0mzWIMzTCjCGXfGlsyBlnG1qikKFMUpbTd3x8e3Z/r3nLi3NbX8/HweLjnfM4570/n3vu+53M+n/ORZGRkyEBERFTD1KroAIiIiCoCEyAREdVITIBERFQjMQESEVGNxARIREQ1EhMgERHVSEyARERUIzEBEhFRjcQE+BaxsbEVHYIoqks9gOpTF9ajcqku9QCqT13UXQ8mQCIiqpGYAImIqEZiAiQiohqJCZCIiGokJkAiIqqRagstmJOTg+zsbDRp0kS+LC0tDcHBwcjIyMCwYcPQqVMntQRJREQkNsEJcPbs2bh16xZOnToFAMjOzka/fv2QmJgIAAgKCkJERARsbW3VEykREZGIBDeBRkdHY+DAgfLXu3btQmJiInbt2oU7d+6gbdu2WLt2bZkD2Lx5Mzp06AADAwP07t0bUVFRgra7d+8ejI2NYWRkVOZjEhERCb4CTE1NVUg2hw4dQteuXdGvXz8AgJubGwICAsp08LCwMPj6+mLdunWwtbXF5s2b4ezsjOjoaJiYmJS4XV5eHiZOnIju3bvj3LlzZTpmWRU2aI6LT8Xbn5G2FIZ1C8XbIRERvRPBCVBHRwcZGRkAgIKCAkRFRcHHx0e+XktLC8+fPy/TwQMDA+Hq6gp3d3cAgL+/P06cOIEtW7Zg6dKlJW63dOlStGvXDj169FB7AnycL4XjgWTR9ndsuBEM64q2OyIiekeCm0A/+OAD/PLLL7h69SrWrl2LrKwsDBgwQL4+Pj4eTZs2FXzgvLw8xMTEwM7OTmG5nZ0dLly4UOJ2R44cwZEjR7BmzRrBxyIiInqT4CvARYsWYcSIEejbty9kMhkcHR3xwQcfyNfv378f3bp1E3zg9PR0FBYWQl9fX2G5vr4+Hj9+rHKblJQUzJo1C7/88gvq1asn+Fjv9Tw5beN331aFnBc5iE1LEnWfQlWX5wMC1acurEflUl3qAVSfuqiqh7m5uSj7FpwAbWxs8Ndff+HChQuoV68eevXqJV+XkZEBT09P9OjRo8wBSCQShdcymUxpWTEvLy9MnDgRXbp0KdMx3ueP9Sg55523VUVLWwvmRuKcvLKIjY0V7U1T0apLXViPyqW61AOoPnVRdz0EJcDc3FysX78eXbp0waBBg5TW6+npKdwPFKJx48aQSqVKV3tpaWlKV4XFTp8+jXPnzsmbP2UyGYqKitC4cWOsW7cOEyZMKFMMRERUcwm6B6ipqYmAgAAkJYnXdKehoQEbGxtERkYqLI+MjCyxKTUqKgpnzpyR//vss8+gpaWFM2fOYPjw4aLFRkRE1Z/gJlBra2vExcWJevBp06bB29sbnTp1Qrdu3bBlyxakpKTAw8MDALBs2TJcunQJ4eHhAAArKyuF7a9cuYJatWopLSciInobwQnw888/h7u7O/773//CwcFBlIM7OTnhyZMn8Pf3R2pqKiwtLRESEgJTU1MArzq9xMfHi3IsIiKi1wlOgBs2bICenh5cXFxgaGiIli1bQktLS6GMRCJBSEhImQLw9PSEp6enynVBQUGlbuvm5gY3N7cyHY+IiAgoQwK8ffs2JBIJjI1fDQsofgbo60rqvUlERFTZCE6A169fV2ccRERE5YrzARIRUY0k+Arwdc+fP8ezZ89QVFSktK60h1gTERFVFmVKgMHBwdiwYUOpwyGePHny3kERERGpm+Am0F9++QWzZs2CiYkJFi9eDJlMBh8fH8yePRtNmzaFtbU1Nm7cqM5YiYiIRCM4AQYFBaFXr17Ys2eP/JFj9vb2WLJkCaKjo5GRkYFnz56pK04iIiJRCU6AcXFxGDJkyKuNar3aLD8/H8CrZ4F+8skn2Lx5sxpCJCIiEp/gBKijowOZTAYA0NXVhVQqRUpKinx9o0aN8PDhQ/EjJCIiUgPBCdDc3Bw3b94EANSuXRvW1tb4/fffkZ+fj9zcXOzcuRMtWrRQW6BERERiEtwLdPDgwQgKCkJubi40NTUxd+5cjB8/Hi1btoREIkF2dja+++47dcZKREQkGsEJcMaMGZgxY4b89eDBg3Hw4EHs27cPUqkUAwYMQM+ePdUSJBERkdjeaSB8MVtbW9ja2ooVCxERUbnho9CIiKhGKvEKcOjQoWXemUQikU9eS0REVJmVmACLiorKPL1R8TAJIiKiyq7EBHjgwIHyjIOIiKhc8R4gERHVSEyARERUI5XYBNqwYcMy3wOUSCRIT09/76CIiIjUrcQEOH/+/DInQCIioqqixAS4cOHC8oyDiIioXPEeIBER1UhlehTas2fPsHHjRhw9ehSJiYkAAFNTUzg4OGD69OmoX7++WoIkIiISm+ArwJSUFHz00UdYu3YtcnJy0KNHD3Tv3h05OTnw9/dH7969FeYHJCIiqswEXwF+8cUXSE1Nxfbt2zFo0CCFdYcOHcLEiROxfPlybNq0SfQgiYiIxCb4CvDEiRPw8vJSSn4AMHDgQEyePBlHjx4VNTgiIiJ1EZwAnz9/DmNj4xLXGxsbIysrS5SgiIiI1E1wAjQzM0N4eDiKioqU1hUVFSEiIgJmZmaiBkdERKQughOgt7c3zp49ixEjRuDIkSOIi4tDXFwcDh8+DCcnJ5w7dw5TpkxRZ6xERESiEdwJ5pNPPkF6ejrWrFmDM2fOyJfLZDLUrVsXn3/+OcaPH6+WIImIiMRWpnGAs2fPhru7OyIjI/HgwQMAr8YB9unTB40aNVJLgEREROpQpgQIAI0aNcLIkSPVEQsREVG5EZwA7927hz/++APx8fHIysqCrq4uWrdujb59+6JVq1bqjJGIiEh0b02Az58/x6xZs7Bv3z6VPUBr1aqFkSNHIiAgADo6OmoJkoiISGylJkCZTAZXV1ecPXsWdnZ2GDNmDCwtLaGrq4usrCzcunULv//+O0JDQ/H48WPs3bu3nMImIiJ6P6UmwIiICJw9exZffPEFZs2apbTe2toao0ePRkBAAFasWIH9+/djyJAhaguWiIhILKWOA9y9ezfat2+vMvm9bvbs2bCyssKuXbtEDY6IiEhdSk2AV69exYABAwTtaODAgYiJiREjJiIiIrUrNQGmpaXBxMRE0I5MTEyQlpYmSlBERETqVmoCzM7OhpaWlqAdaWpq4sWLF6IERUREpG5vfRaoRCIpjziIiIjK1VvHAU6bNg0zZsx4645UjREkIiKqrEpNgC4uLmoPYPPmzdiwYQNSU1NhYWEBPz8/dO/eXWXZ27dvY+7cubhz5w6ePXuGZs2aYeTIkfD19YWGhobaYyUiouqj1AS4adMmtR48LCwMvr6+WLduHWxtbbF582Y4OzsjOjpaZecbDQ0NuLi4oEOHDmjQoAFu3LiBWbNmoaCgAMuXL1drrEREVL2U+WHYYgoMDISrqyvc3d0BAP7+/jhx4gS2bNmCpUuXKpVv3bo1WrduLX9tamqKs2fP4vz58+UWMxERVQ+CJ8QVW15eHmJiYmBnZ6ew3M7ODhcuXBC0j7i4OJw4cQI9evRQR4hERFSNVVgCTE9PR2FhIfT19RWW6+vr4/Hjx6Vua29vDwMDA3z44YewtbXF559/rs5QiYioGqrQJlBAeZiFTCZ769CLLVu2ICsrCzdu3MDnn3+Ob775BnPmzCmxfGxs7LsHqG387tuqkPMiB7FpSaLuU6j3+jtUMtWlLqxH5VJd6gFUn7qoqoe5ubko+66wBNi4cWNIpVKlq720tDSlq8I3GRu/SkoWFhYoLCzEzJkzMXPmTNSurbo67/PHepSc887bqqKlrQVzI3FOXlnExsaK9qapaNWlLqxH5VJd6gFUn7qoux4V1gSqoaEBGxsbREZGKiyPjIxEt27dBO+nqKgIBQUFKCwsFDtEIiKqxiq0CXTatGnw9vZGp06d0K1bN2zZsgUpKSnw8PAAACxbtgyXLl1CeHg4AOD333+HpqYmrKysoKGhgStXrmD58uUYNmwY6tatW5FVISKiKqZMCTA4OBjbtm1DQkICnj59qrReIpEgPT1d8P6cnJzw5MkT+Pv7IzU1FZaWlggJCYGpqSkAICUlBfHx8f8fbO3a+PrrrxEXFweZTAYTExN4enpi6tSpZakGERGR8AS4fPlyfPPNN2jXrh2cnZ2hp6cnSgCenp7w9PRUuS4oKEjh9ahRozBq1ChRjktERDWb4AT466+/YtCgQfj111/VGQ8REVG5ENwJJjs7Gx9//LE6YyEiIio3ghOgra0t/v77b3XGQkREVG4EJ0B/f38cOXIEv/76K2QymTpjIiIiUjvB9wBdXFyQl5eHmTNnYv78+TA0NIRUKlUoI5FIEB0dLXqQREREYhOcAJs0aQJ9fX20adNGnfEQERGVC8EJ8MCBA+qMg4iIqFxV2KPQiIiIKlKZngRTWFiIHTt24OjRo0hMTATwalJaBwcHuLi4KN0TJCIiqqwEJ8Bnz57ByckJly9fhq6uLlq2bAmZTIZTp07hwIED2LZtG8LCwlCvXj11xktERCQKwU2gK1euxJUrV/Dll1/in3/+wenTp3HmzBncu3cPfn5+uHz5MlauXKnOWImIiEQjOAHu378fHh4emDJlCjQ0NOTL69SpA29vb0yYMAERERFqCZKIiEhsghNgeno6LC0tS1xvZWVVppkgiIiIKpLgBGhiYqI0ee3rIiMjYWJiIkpQRERE6iY4AY4bNw4HDhyAj48Pbt26hfz8fOTn5+PmzZuYNm0aDh48iE8++USdsRIREYlGcC/QWbNm4f79+9i6dSt27twJiUQCAJDJZJDJZPDw8MDMmTPVFigREZGYBCdAiUSCgIAAeHl54ciRIwrjAO3t7WFlZaW2IKuTulIpLj4tFG1/RtpSGNYVb39ERDVFmQbCA4ClpWWpnWGodGm5hXA6kCza/o4NN4JhXdF2R0RUY/BRaEREVCOVeAXYoUMH1KpVCxcvXkSdOnXQoUMH+X2/kkgkEsTExIgdIxERkehKTIA9evSARCJBrVq1FF4TERFVByUmwKCgoFJfExERVWW8B0hERDVSiVeADx48eKcd8mkwRERUFZTaCeZd7vk9efLkvQIiIiIqDyUmwG+//ZadXoiIqNoqMQG6ubmVZxxERETlip1giIioRirxCvC33357px26uLi8czBERETlpcQEOHXq1DLvTCKRMAESEVGVUGICvHr1annGQUREVK5KTICmpqblGQcREVG5YicYIiKqkUq8Apw2bRokEgnWr18PqVSKadOmvXVnEokE3377ragBEhERqUOJCfD06dOoVasWioqKIJVKcfr0aUHTIREREVUFJSbA69evl/qaiIioKuM9QCIiqpGYAImIqEYqsQkUALp161amnUkkEkRHR79XQEREROWh1AR49+5daGlpwcbGBrVq8WKRiIiqj1IT4AcffIArV64gLi4OTk5OGD16NGxsbMopNCIiIvUp9bLu5MmTuHTpEtzd3XHkyBHY2dmhS5cu8Pf3R0JCQjmFSEREJL63tmu2bt0aCxcuxKVLl3D06FH06dMHP/74Iz788EP0798fP/74I2eBJyKiKqdMN/Y6d+4Mf39/3Lp1CyEhIahbty4WLFiAH3/8UV3xERERqUWZe7ZkZmZi+/btWL9+PaKiolC/fn2Ym5u/cwCbN29Ghw4dYGBggN69eyMqKqrEsmfOnIGLiwvatm2L5s2bo3v37vjll1/e+dhERFRzldoJplh+fj4OHz6M0NBQHD16FABgb2+Pbdu2wcHBARoaGu908LCwMPj6+mLdunWwtbXF5s2b4ezsjOjoaJiYmCiV//PPP9GuXTvMmjULzZo1w4kTJ/Dpp59CU1MTzs7O7xQDERHVTKUmwLNnzyI0NBT79u3D8+fP0aNHD/j7+2PYsGGoX7/+ex88MDAQrq6ucHd3BwD4+/vjxIkT2LJlC5YuXapU/n//+5/C60mTJuHMmTMIDw9nAiQiojIpNQEOHToUWlpasLe3x8iRI2FoaAgAiI2NLXGbTp06CTpwXl4eYmJiMGPGDIXldnZ2uHDhgqB9AMDz58/lcREREQn11ibQnJwc7Nu3D+Hh4aWWk8lkkEgkgnuEpqeno7CwEPr6+grL9fX18fjxY0H7OHz4ME6dOoUjR44IKk9ERFSs1AQYGBio9gDenEKpOJG+TXR0NCZPnow1a9a89aqztCvWt9I2fvdtVSgsKhJ1fzkvchCbliSo7Hv9HSqZ6lIX1qNyqS71AKpPXVTV4306Xr6u1ATo6uoqykFUady4MaRSqdLVXlpamtJV4ZvOnz+P0aNHY+HChZg0adJbj/U+f6xHyTnvvK0qUpEfKaelrQVzo7fXLzY2VrQ3TUWrLnVhPSqX6lIPoPrURd31qLAHfGpoaMDGxgaRkZEKyyMjI0t9CPe5c+fg7OyM+fPnY+rUqeoOk4iIqqkKfcL1tGnTsGPHDgQHB+POnTtYsGABUlJS4OHhAQBYtmwZHB0d5eXPnDkDZ2dneHh4YPTo0UhNTUVqairS0tIqqgpERFRFCRoHqC5OTk548uQJ/P39kZqaCktLS4SEhMDU1BQAkJKSgvj4eHn5HTt24MWLF9i4cSM2btwoX25iYsIZ64mIqEwqNAECgKenJzw9PVWuCwoKUnr95jIiIqJ3wUn+iIioRhKUAHNzc7FmzRqcPHlS3fEQERGVC0EJUFNTEwEBAUhKEjbejIiIqLIT3ARqbW2NuLg4dcZCRERUbgQnwM8//xzBwcF87BgREVULgnuBbtiwAXp6enBxcYGhoSFatmwJLS0thTISiQQhISGiB0lERCQ2wQnw9u3bkEgkMDZ+9WzMxMREpTJCnuFJRERUGQhOgBxoTkRE1QnHARIRUY1UpgRYWFiIkJAQTJ8+HWPGjMGNGzcAABkZGdizZw9SUlLUEiQREZHYBCfAzMxM2Nvbw9vbG/v27cOxY8eQnp4OAKhXrx4WLVqEH374QW2BEhERiUlwAly2bBlu376N0NBQxMTEQCaTyddJpVIMHToUx44dU0uQREREYhOcAA8cOAAvLy98/PHHKnt7mpmZ4cGDB6IGR0REpC6CE2BGRgZatWpV4nqZTIa8vDxRgiIiIlI3wQnQ1NQUN2/eLHH9uXPn0KZNG1GCIiIiUjfBCdDZ2RnBwcE4d+6cfFlxU+j333+P/fv3w9XVVfwIiYiI1EDwQPjZs2fjr7/+gqOjI9q0aQOJRAJfX188efIEqampGDx4MLy9vdUZKxERkWgEJ8A6deogJCQEoaGh2Lt3LyQSCQoKCtCxY0c4OTlh9OjRfBQaERFVGYITYDFnZ2c4OzurIxYiIqJyU+YECAA3btyQD3kwMTFBu3btePVHRERVSpkS4O7du7F06VI8fPhQPhBeIpHA0NAQS5cu5ZUhERFVGYIT4Pbt2zF9+nSYm5tj2bJlaNOmDWQyGe7du4fg4GB4e3sjLy8Pbm5u6oyXiIhIFIIT4Ndff41OnTph//790NTUVFg3efJkDBo0CF9//TUTIBERVQmCxwEmJyfD2dlZKfkBgKamJsaMGYOHDx+KGhwREZG6CE6AFhYWePToUYnrHz58iLZt24oSFBERkboJToDLly/Htm3bsGfPHqV1u3fvRnBwMFasWCFqcEREROoi+B7gxo0b0bhxY0yaNAm+vr5o1aoVJBIJ4uLi8O+//8LMzAwbNmzAhg0b5NtIJBKEhISoJXAiIqL3ITgB3r59GxKJBMbGxgAgv99Xt25dGBsb4+XLl7hz547CNhwbqH51pVJcfFr41nI52sbIePr2/RlpS2FY9+37IyKq6gQnwOvXr6szDnpHabmFcDqQLNr+jg03gmFd0XZHRFRpCb4HSEREVJ0wARIRUY3EBEhERDUSEyAREdVITIBERFQjMQESEVGNJDgBduzYEQcPHixx/eHDh9GxY0dRgiIiIlI3wQkwMTER2dnZJa7Pzs6WT5JLRERU2ZWpCbS0J7v8888/qFev3nsHREREVB5KfRLMjh078Ntvv8lfr127Ftu2bVMql5GRgZs3b8LBwUH8CImIiNSg1ASYnZ2N1NRU+evMzEwUFRUplJFIJNDW1oa7uzt8fX3VEyUREZHISk2AkydPxuTJkwEAHTp0wOrVqzFo0KByCYyIiEidBD8M+9q1a+qMg4iIqFwJToDFnj9/jqSkJDx9+hQymUxpfY8ePUQJjIiISJ0EJ8CnT59iwYIF2LNnDwoLleeLk8lkkEgkePLkiagBEhERqYPgBDh79mzs378fkydPRo8ePaCnpydKAJs3b8aGDRuQmpoKCwsL+Pn5oXv37irL5ubmYvbs2bh69Sru3r2Lbt264cCBA6LEQURENYvgBHj8+HF4e3tj1apVoh08LCwMvr6+WLduHWxtbbF582Y4OzsjOjoaJiYmSuULCwuhqakJLy8vHD16FJmZmaLFQkRENYvggfAaGhowMzMT9eCBgYFwdXWFu7s72rZtC39/fxgYGGDLli0qy+vo6CAgIAATJkyAkZGRqLEQEVHNIjgBDhs2DMeOHRPtwHl5eYiJiYGdnZ3Ccjs7O1y4cEG04xAREakiuAl0xowZmDRpEqZMmYJJkybBxMQEUqlUqZy+vr6g/aWnp6OwsFCpvL6+Ph4/fiw0LEFiY2PffWNtY/ECAVD4xoMEKtv+cl7kIDYtSdR9qsN7ndNKhPWoXKpLPYDqUxdV9TA3Nxdl34ITYKdOnSCRSBATE4OQkJASy5W1F+ibzxct7k0qpvf5Yz1KzhExEkBaS9wZqMTen5a2FsyNxHlzqUtsbKxoH4CKxHpULtWlHkD1qYu66yE4Ac6fP1/UxNS4cWNIpVKlq720tDTBV5FERETvSnACXLhwoagH1tDQgI2NDSIjIzF8+HD58sjISDg6Oop6LCIiojeV+UkwwKvhCJmZmahfvz5q136nXQAApk2bBm9vb3Tq1AndunXDli1bkJKSAg8PDwDAsmXLcOnSJYSHh8u3uX37NvLy8pCeno7s7Gz5I9o6dOjwznEQEVHNU6bsdfnyZSxfvhznz59Hfn4+9uzZg969eyM9PR0+Pj6YNm0aevfuLXh/Tk5OePLkCfz9/ZGamgpLS0uEhITA1NQUAJCSkoL4+HiFbZydnRUm3v3oo48AvJqSiYiISCjBCfDPP/+Eo6MjDAwMMHbsWAQHB8vXNW7cGFlZWfjll1/KlAABwNPTE56enirXBQUFKS27fv16mfZPRESkiuAuhCtWrICZmRkuXLiAzz//XGl9r1698Ndff4kaHBERkboIToCXL1/GuHHjoKmpqbI3qJGRkcLkuURERJWZ4ARYq1Yt1CplzFlqaiq0tLRECYqIiEjdBCdAGxsbHD58WOW6vLw8hIaGomvXrqIFRkREpE6CO8HMmTMHo0aNwvTp0+Hs7AzgVS/N48ePY+3atYiPj0dgYKDaAqXyUVcqxcWnyvM9visjbSkM64q3PyIisQhOgH379sX333+PefPmYceOHQAAHx8fyGQyNGjQAJs3b0aXLl3UFiiVj7TcQjgdSBZtf8eGG8Gwrmi7IyISTZnGAY4aNQqDBg1CZGQk7t27h6KiIrRq1Qr9+vWDrq6uumIkIiISXZkf46KtrY3BgwerIxYiIqJyI7gTzMGDBzFv3rwS18+bN6/ETjJERESVjeAEuHHjRrx48aLE9bm5uVi/fr0oQREREamb4AR48+ZN2NjYlLi+Y8eOuH37thgxERERqZ3gBFhQUICcnJInh83JycHLly9FCYqIiEjdBCdAKysrhIeHo6ioSGldUVERwsPDYWFhIWpwRERE6iI4AU6ZMgWXLl2Ci4sLYmJi8PLlS7x8+RIxMTFwdXXFpUuX4O3trc5YiYiIRCN4GMTIkSMRHx8PPz8/HDt2DAAgkUggk8kgkUiwYMECjBkzRm2BEhERialM4wDnzp2LUaNGISIiAgkJCZDJZGjVqhWGDh2Kli1bqilEIiIi8QlKgDk5ORg9ejTGjBmDcePGYcaMGeqOi4iISK0E3QPU0tLC1atXUVjIhxoTEVH1ILgTTM+ePREVFaXOWIiIiMqN4AS4Zs0aXL58GUuWLEFCQoLK4RBERERVheBOMF26dIFMJkNgYCACAwNRq1Yt1KlTR6GMRCLBw4cPRQ+SiIhIbIIT4IgRIyCRSNQZCxERUbkRnACDgoLUGQcREVG5EnwPkIiIqDopUwJMTEzEzJkzYWNjAxMTE5w9exYAkJ6ejv/973+IiYlRR4xERESiE9wEeufOHQwYMABFRUXo3LkzEhMT5eMCGzdujIsXL+Lly5f49ttv1RYsERGRWAQnwKVLl6JevXo4fvw4pFIp2rRpo7De3t4ee/fuFTs+IiIitRDcBBoVFQVPT080bdpUZW9QExMTPHr0SNTgiIiI1KVME+Lq6OiUuP7p06eQSqWiBEVERKRugptArayscObMGUyaNElpnUwmQ0REBGxsbMSMjaqBulIpLj4V9xmy9Ro0F3V/RFQzCU6APj4+8PT0xFdffQUnJycAr2aCv3v3Lvz8/HDlyhXs3LlTbYFS1ZSWWwinA8mi7jPcvhEsRN0jEdVEZZoQ98GDB1i1ahVWr14tXwYAUqkUK1euRP/+/dUTJRERkcjKNCHup59+ilGjRiE8PBxxcXEoKipCq1at4OjoiBYtWqgrRiIiItG9NQG+fPkSBw8eREJCAho1agQHBwdMnTq1PGIjIiJSm1ITYGpqKgYNGoT4+HjIZDIAgI6ODnbu3IkePXqUS4BERETqUGoCXLlyJRISEjB16lR89NFHiIuLg7+/P+bPn49z586VV4xECnS1NXHxqXj7M9KWwrCuuD1ViajyKzUBnjx5Ei4uLli5cqV8WdOmTeHp6Ynk5GQYGRmpPUCiN6W/lMH5kHjzTh4bbgTDuqLtjoiqiFIHwqempqJbt24Ky2xtbSGTyZCUlKTWwIiIiNSp1ARYWFgITU1NhWXFr3Nzc9UXFRERkZq9tRdoQkICLl26JH/97NkzAEBsbCx0dXWVynfq1EnE8IiIiNTjrQnQz88Pfn5+Ssvnz5+v8Fomk0EikeDJkyfiRUdUDsR+XJvYnWoevpQi+UXljY+oqio1AQYGBpZXHEQVRuzHtYndqSb5RSH676288RFVVaUmQFdX1/KKg4iIqFyV6VFoRFT1id3ky9k5qKqq8AS4efNmbNiwAampqbCwsICfnx+6d+9eYvm///4b8+bNw+XLl9GwYUNMmDAB8+fPVzlJLxEpE7vJl7NzUFUleEJcdQgLC4Ovry/+97//4fTp0+jatSucnZ3x4MEDleWfPXuGESNGoGnTpjh58iRWr16NjRs34ttvvy3nyImIqKqr0CvAwMBAuLq6wt3dHQDg7++PEydOYMuWLVi6dKlS+dDQUOTk5CAoKAhaWlqwsrLC3bt3sWnTJkyfPp1XgVQpCG1izNE2RoaAR7rVlUpFiKrmEtqLVuj5qOy9fAE2SwslycjIkFXEgfPy8tC8eXP89NNPGD58uHz53LlzcfPmTRw8eFBpG29vbzx9+hQhISHyZZcvX4adnR1iYmLQsmXLcoiciIiqgwprAk1PT0dhYSH09fUVluvr6+Px48cqt3n8+LHK8sXriIiIhKrQe4AAlJotiwfUl6W8quVERESlqbAE2LhxY0ilUqUrt7S0NKWrvGJNmzZVWR5AidsQERGpUmEJUENDAzY2NoiMjFRYHhkZqTQDRbGuXbvi/PnzCg/ijoyMRPPmzdGiRQu1xktERNVLhTaBTps2DTt27EBwcDDu3LmDBQsWICUlBR4eHgCAZcuWwdHRUV5+1KhR0NLSwtSpU3Hz5k2Eh4fjm2++wdSpU9kESkREZVKhCdDJyQl+fn7w9/dHr169EB0djZCQEJiamgIAUlJSEB8fLy/foEED7NmzB48ePULfvn0xb948TJs2DdOnTxd8zK+//hp9+/aFiYkJzMzMMGbMGNy8eVOhjI+PD/T09BT+ffzxxwplXr58iXnz5qF169YwNDTE2LFjkZws3uDit/Hz81OK8T//+Y98vUwmg5+fHywsLNCsWTMMHjwYt27dqlR1AABra2uleujp6WH06NEAKve5OHfuHMaOHQtLS0vo6elh+/btCuvFOgcZGRnw8vKCqakpTE1N4eXlhYyMjHKpR35+PpYuXYru3bvD0NAQbdu2haenp9JY3cGDByudp4kTJ1aaegDivZcquh6qPi96enqYO3euvExlOB9Cvmsr+jNS4Z1gPD09cf36dTx+/BinTp1Cjx495OuCgoJw/fp1hfLt2rXDoUOHkJqaijt37sDX17dMV39nz57FpEmTcOTIEYSHh6N27doYPnw4nj5VHADUp08f3LlzR/4vNDRUYf3ChQsRERGBn376CQcPHsTz588xZswYFBaW31P2zc3NFWKMioqSr1u/fj0CAwOxZs0anDx5Evr6+hgxYgSeP39eqeoQGRmpUIdTp05BIpEoDI2prOciOzsbVlZWWL16NbS0tJTWi3UOPD09ce3aNYSGhmLXrl24du0avL29y6UeL168wNWrVzF37lycOnUKO3bsQHJyMkaNGoWCggKFsm5ubgrnKSAgQGF9RdajmBjvpYqux+vx37lzB7///jsAKHxmgIo/H0K+ayv6M1Jh4wAri6ysLJiammL79u0YOHAggFe/FJ88eYKdO3eq3CYzMxNt2rRBYGCg/EolKSkJ1tbW2LVrF/r166f2uP38/BAeHo7z588rrZPJZLCwsMDkyZPlvwpzcnJgbm6OFStWwMPDo1LUQZW1a9diw4YNuH37NrS1tavEuQAAIyMjfPXVV3BzcwMg3jm4c+cOunXrhsOHD8PW1hYAcP78eQwcOBAXL16Eubm5Wuuhyu3bt2Fra4tz586hXbt2AF5dcVhZWcHf31/lNpWhHmK8lypDPd40c+ZMREVF4a+//pIvq2znA1D+rq0Mn5EKvwKsaFlZWSgqKoKenp7C8vPnz6NNmzbo1KkTZs6ciX///Ve+LiYmBvn5+bCzs5MvMzY2Rtu2bXHhwoXyCh0JCQmwtLREhw4dMHHiRCQkJAAA7t+/j9TUVIX4tLS00L17d3l8laUOr5PJZPjll18wZswYaGtry5dXhXPxJrHOwZ9//gldXV2FjmG2trbQ0dGpsPoV/zp/8zOze/dutG7dGra2tli8eLHCr/jKUo/3fS9VlnoUy8rKQlhYmPxpWq+rbOfjze/ayvAZqfCHYVc0X19fWFtbo2vXrvJlH3/8MYYOHYoWLVogMTERK1euhKOjI/744w/UrVsXjx8/hlQqRePGjRX2VdogfrF17twZmzZtgrm5OdLS0uDv7w97e3tER0cjNTVVHs+b8T169AgAKkUd3hQZGYn79+9j/Pjx8mVV4VyoItY5ePz4MRo3bqzQzC+RSNCkSZMKqV9eXh4WL16MAQMGwMjISL7c2dkZJiYmaNasGW7fvo1ly5bhxo0b2Lt3L4DKUQ8x3kuVoR6v27VrF16+fAkXFxeF5ZXxfLz5XVsZPiM1OgF+9tlniI6OxuHDhyF97XmLI0eOlP+/Xbt2sLGxgbW1NY4cOaLQK/VNbxvEL6b+/fsrvO7cuTNsbGywY8cOdOnSBUDZHzIgtIy6bNu2DR9++CE6dOggX1YVzkVpxDgHqspXRP0KCgrg5eWFzMxM/PbbbwrrJkyYIP9/u3bt0LJlS/Tr1w8xMTGwsbEBUPH1EOu9VNH1eN22bdswePBgNGnSRGF5ZTsfJX3XqoqjPD8jNbYJdOHChdi9ezfCw8Pf+gzR5s2bw9DQEHFxcQBeDcgvLCxEenq6QrnSBvGrm66uLiwsLBAXFwcDAwMAyo+Hez2+ylaHf//9FwcPHlTZlPO6qnAuAIh2Dpo2bYq0tDT5E4+AVx/s9PT0cq1fQUEBJk2ahL///hv79u1Do0aNSi3/wQcfQCqVKpynylCP173Le6ky1ePatWu4cuXKWz8zQMWej5K+ayvDZ6RGJsAFCxZg165dCA8PVxg6UJL09HQ8evRIfsJsbGxQp04dhUH8ycnJ8puxFSE3NxexsbEwMDBAixYtYGBgoBBfbm4uzp8/L4+vstVhx44dqFu3LpycnEotVxXOBQDRzkHXrl2RlZWFP//8U17mzz//RHZ2drnVLz8/Hx4eHvj7778REREh/9uX5u+//0ZhYaG8bGWox5ve5b1Umeqxbds2mJqaok+fPm8tW1Hno7Tv2srwGalxTaBz587Fzp078euvv0JPT0/eDq2jowNdXV1kZWVh9erVcHR0hIGBARITE7F8+XLo6+tjyJAhAF6NRxw/fjw+//xz6Ovro2HDhli0aBHatWsn6M0ohuL7MMbGxvJ7gC9evICLiwskEgl8fHywbt06mJubo02bNli7di10dHQwatSoSlOHYjKZDMHBwXByckK9evXkyyv7ucjKypL/oi4qKkJSUhKuXbuGhg0bwsTERJRz0LZtW3z88ceYPXs21q9fD5lMhtmzZ8PBwUG0nnql1aN58+Zwd3fHlStX8Ntvv0Eikcg/M/Xr14eWlhbi4+MREhICe3t7NGrUCHfu3MHixYvRoUMHea+8iq5Hw4YNRXkvVXQ9TExMALwanhIaGoqZM2cqNfNVlvPxtu9asb6n3qcuNW4YxJs914otWLAACxcuRE5ODtzc3HDt2jVkZmbCwMAAvXr1wqJFi2BsbCwvn5ubiyVLlmDXrl3Izc3FRx99hHXr1imUUaeJEyciKioK6enpaNKkCTp37oxFixbBwuLV3NwymQyrV6/G1q1bkZGRgU6dOmHt2rWwsrKqNHUodvr0aTg6OuLEiRPo1KmTfHllPxdnzpzB0KFDlZa7uLggKChItHPw9OlTLFiwAIcOHQIADBw4EF999VWJ72Ux6+Hr64uOHTuq3C4wMBBubm5ISkqCl5cXbt26hezsbBgZGcHe3h6+vr5o2LBhpajH119/Ldp7qSLrERQUBAD49ddfMWvWLNy4cQPNmyvO/VdZzsfbvmsB8b6n3rUuNS4BEhERATX0HiARERETIBER1UhMgEREVCMxARIRUY3EBEhERDUSEyAREdVITICE+/fvK028WTzhblW0fft26Onp4f79+xUdSoXZtGkTrK2tkZ+f/977UvX39PHxgbW1tUI5a2tr+Pj4yF+rel9VF3p6evDz85O/PnPmDPT09HDmzJlyj2Xx4sUVNn1ZVccEWI3s2LEDenp6+OCDDyo6lLc6f/48/Pz8RJ2B+l09ffoUq1atQq9evWBiYoKmTZuiffv2cHd3R0REhMIzBquC7OxsBAQEYNasWahTp458ubW1NfT09DBo0CCV2x09elQ+e/ju3bvLK1y1KE6+xf+aNGmC1q1bw97eHsuXL1ea0b4qmzZtGq5fv46DBw9WdChVDhNgNRISEgJTU1PEx8crPBfvXcybNw8pKSkiRaYsOjoaa9asQWZmptqOIcTVq1dha2uL9evXw9LSEkuWLMG6deswbtw4JCUlYfz48fjpp58qNMay2rFjB7Kzs5WmyAEATU1NnD9/XmUCCA0NhaamptLysWPHIiUlBaampmWKw9TUFCkpKRg7dmyZthOTk5MTvv/+e2zcuBHz5s1Dy5YtERQUhG7dumHXrl0VFpeYmjdvjgEDBmDjxo0VHUqVU+OeBVpdpaSk4PTp0/juu++wYsUKhISEKMxxWFa1a9dG7drV++2RmZkJV1dXyGQy/PHHHwqPXwJezV92+vTptybpFy9eKEzgW9F+/fVXODg4QEdHR2ldly5dcP36dezatQuzZ8+WL8/OzsbBgwfh4OCAffv2KWwjlUqVprARQiKRqEyo5cna2hpjxoxRWJaYmAgnJyf4+Pigbdu2Sk25lUFZ31NOTk6YMGEC7t27BzMzMzVGVr3wCrCaKP71PmjQIIwcORJhYWEq7/+kpKTA3d0dxsbGaNmyJaZPn64wU3QxVfcA37zvUWzw4MEYPHiwwrLNmzeje/fuMDQ0RMuWLdG7d29s2bJFvu9ly5YBADp27Chvpnr9/klkZCSGDBkCY2NjGBoaYsiQISpnd7548SLs7e1hYGCA9u3bIyAgQHCT5datW5GcnIxVq1YpJb9iH330kcJzGYvvh50+fRq+vr74z3/+A0NDQ/n6gwcPol+/fmjevDlatGgBNzc33L17V2GfWVlZ8ocTGxgYwNzcHEOHDlWof1xcHCZMmIC2bdvCwMAA7dq1g7u7Ox4+fFhqnR48eICrV6+W+CBwDQ0NDB8+HKGhoQrL9+/fj5cvX2LEiBFK27zrPdWS7gHevHkTY8eOhampKZo3b47+/fvj2LFjCmWK76nt2rUL3377LaytrWFgYID+/fvj6tWrZYrjTaampti0aRPy8/OxYcMGhXUPHjzA5MmT0bp1axgYGKBnz55Kcx8KdePGDfj4+MDGxgYGBgYwMzPDpEmTkJSUpFCutPdUQUEB/P390alTJzRr1kzejPvmj5Ti833gwIF3irWmqt4/8WuQnTt3wsHBAbq6uhg1ahS++eYbHD9+HAMHDpSXyc3NxbBhwxAXF4fJkyejRYsWiIiIwJQpU0SNJTg4GHPnzoWjoyMmT56M/Px83L59G9HR0Zg4cSKGDh2K2NhYhIWF4csvv5TP9ty2bVsAr2a59vLykj+suKioCNu3b4ejoyMOHDiAzp07AwBu376N4cOHo169epg7dy40NDSwdetWlVc+qhw6dAhaWloYNmxYmeu4YMECNGjQAHPmzMGzZ8/kcU+ePBnt27fHokWL8OzZM/zwww+wt7fHH3/8IZ8Lbc6cOdi7dy88PT1hYWGBzMxM/PXXX7h+/Tp69eqF/Px8ODk5ITc3F56enjAwMEBqaipOnjyJhw8fKiTcN0VHRwOAfNJTVZydnbF161bcuHED7du3B/DqB5SdnZ3SzNti++effzBgwABoaGhg6tSp0NHRwY4dOzBmzBhs27ZN6SHQ3377LfLz8+Hl5YWCggJs2LABbm5uuHLlisL9zbLq2rUrWrVqpTDNTnp6OgYMGICnT5/Cy8sLzZo1Q1hYGHx8fJCRkaHQwUeIyMhIxMbGYvTo0TAyMkJcXBx+/vlnXL58GVFRUdDS0lIor+o9tXr1aqxbtw7jx49Hp06dkJ2djWvXruGvv/5SeN/q6emhVatWOH/+PGbOnPnOf5eahgmwGrh16xZu3LiBBQsWAADat28PS0tLhISEKCTAbdu24c6dO/juu+/k92U8PT3l08GI5ciRI7C0tERwcLDK9e3bt4e1tTXCwsIwePBgtGjRQr4uOzsbc+fOxZgxY+RPvgcADw8P2NraYvny5QgPDwcArFq1Cnl5eTh06BBatWoFAHBzc8OHH34oKM7bt2/DzMwMGhoaCsuzs7ORm5srf127dm00aNBAoYy2tjb2798vbybOz8/HokWL0KZNGxw+fFiehAcPHoy+ffviyy+/xA8//CD/+7i7u+PLL78sMa6EhARs27ZN4Utu3rx5b61T8dXm63/TN3Xv3h3GxsYICQlB+/bt8e+//+KPP/5Q+Hury/Lly/HixQscP35cPj+cu7s7unfvjoULF2Lw4MGoVev/G6aePXuGqKgoeVOqubk5xo0bh5MnT8LBweG9YrG0tMTBgwfx7Nkz1K9fHwEBAUhOTsa+ffvQu3dvAK9mXRk4cCBWrlwJV1dXpfdBaSZNmoQZM2YoLBswYAAGDhyIiIgIjB49WmHdm+8p4NV7xd7eXulKVZWWLVsqtTZQ6dgEWg3s3LkT9evXh729vXzZyJEjcejQIfkvSeDVh0lfXx/Ozs7yZVKpFN7e3qLGU69ePSQnJ+PSpUtl3jYyMhIZGRkYPXo00tPT5f9ycnLQp08fnD9/Hvn5+SgsLMSJEycwYMAAefIDgCZNmih9sZTk+fPnCvMPFvviiy9gZmYm/+fq6qpUxt3dXeGLKiYmBqmpqZg0aZLCFWjHjh3Rp08fHD16VN40W69ePVy6dKnE5szimE6cOIHs7GxBdSn25MkT1KpVC/Xr1y+xjEQiwahRo7B7924UFRVh9+7dqFu3bom9Q8Xy+jl7fXLU+vXrY+LEiUhKSsLff/+tsI2bm5vCfcSePXsCABISEt47Hl1dXQCvmqSBV5+PDh06yJMf8KrJ2MfHB9nZ2Th79myZ9v/6PbysrCw8efIE//nPf9CgQQPExMQolX/zPQW8ei/cunUL//zzz1uP17BhQ6WZ06l0TIBVnEwmw65du9CjRw+kpKTg/v37uH//Prp06YLc3Fz51RLw6v5Gq1atlDo0tGnTRtSYPv30U+jq6qJfv36wsbHB7NmzcerUKUHb3rt3DwAwYsQIhSRkZmaG4OBg5OfnIzMzE2lpaXjx4oXKCS+F1qdevXoq7396eXlh79692Lt3b4lzChY3ZxZLTEwEAKVZr4FXTbsZGRnyHyPLli3DzZs30b59e/Tp0wcrV67EnTt3FPY9ZcoUBAcHw8zMDMOGDcOmTZsEf7kJuQfq7OyM5ORknDt3DqGhoRg8eLDgpuN3lZaWhuzs7BL/RsD//x2LFU8AW6z4vvTTp0/fO57ixFecCBMTE8sU29tkZGTg008/RatWrWBsbIzWrVvDzMwMmZmZKjtWvfmeAoCFCxciMzMTnTt3hq2tLT777DNcvnxZ5fFkMpnS5LhUOjaBVnFnzpxBUlISkpKS5JNBvi4kJATjxo0DUPIH5H3HuRUVFSk0W1lYWODixYs4fvw4Tpw4gSNHjuDnn3+Gh4cHAgIC3rov4NVA7pLuddWvXx9PnjwBgPeqj4WFBWJiYpCXl6fQDGpubi5PrG/epylW0nJV3oxn5MiR6NGjBw4dOoSTJ0/i+++/xzfffIPAwEB5j8XVq1fD3d1dXmbJkiVYu3YtDhw4AEtLyxKP1ahRI8hkMjx79qzUBxm0a9cOVlZW+Oqrr3Dp0iX4+voKro86lHTOSup9KsbYzFu3bqFJkyalXi2/z7GKJ62ePn06OnTogHr16kEikWDixIny9/nrVL2nevXqhatXr+LQoUOIjIzE77//jqCgICxZsgRz5sxRKJuRkaH2e7jVDRNgFRcSEoKGDRuqHAN06tQp/PTTT/KOE6amprh+/ToKCwsVvliKr7reRk9PT+Uv18TERKVfrzo6Ohg2bBiGDRuGgoIC+Pj44Oeff8a8efNgaGhY4i/V4ubMJk2alNiTEQD09fWhra2t8p6H0PoMHDgQ0dHR2Lt3r+Bm05IUj5G7e/cu7OzsFNbFxsZCT09P4Yu2WbNm8PDwgIeHBzIyMtC/f3+sWbNGocu+paUlLC0tMWfOHNy4cQN9+vRBUFBQqfeDiq9WEhISSu0IAwCjR4/GF198gSZNmqBv375lrXKZNWnSBDo6OirPWWxsLACUeazhu/rzzz8RHx+vcN5NTU1Fiy0jIwMnT56Er6+vwo+L3NzcMj/8QU9PDy4uLnBxcUFOTg5GjRqFNWvWYNasWQqf4/j4+BJ7M5NqbAKtwoqbOPv3748hQ4Yo/Zs2bRqKiorkA37t7e3x77//KnSBLywsxPfffy/oeK1bt1a6D3LgwAEkJycrLCu+OitWu3ZttGvXDgDkH/7i+yNvfhn069cPDRo0wNq1a/Hy5UulGNLS0gC8ujKws7PD4cOHER8fr7D+zS7+JZkwYQKMjIywaNEi3Lp1S2UZob/+i7u6b9myBTk5OfLl169fR2RkJOzt7SGRSFBYWKj0I0JPTw8tWrSQ/y2ePXuGgoIChTJt27aFlpbWW788u3XrBgAq7zG9aezYsViwYAHWrl1bLmM+pVIp+vXrhyNHjijc03r+/Dl+/vlnGBsby98n6pSYmIipU6dCQ0NDocekg4MDrl27htOnT8uX5efn47vvvoO2trb8/qMQxS0ib75/Nm3apPLqryRvfpa0tLTQtm1bvHz5Ei9evJAvf/r0KRISEuTnn4ThFWAVVtzJpaTOCy1btoSlpSV27tyJmTNnwt3dHT/99BNmzJiBa9euoWXLlggPD1d5H0yVCRMmYObMmXB1dUX//v1x9+5d7Nq1S6ETCvDq/p2+vj5sbW3RtGlTxMfH44cffoCVlRUsLCwAQP64thUrVmDkyJHQ0NDARx99BH19faxfvx6TJk1Cz5494ezsDAMDAyQnJ+PMmTPQ0dGRJ/TPPvsMJ0+exMCBA+Hp6Yk6depg69atMDExEfSEmQYNGmD79u0YM2YMevfujeHDh6Nz587Q0tJCSkqK/Iu6eNhFaerUqYNVq1Zh8uTJcHBwwJgxY+TDIOrXr4/PPvsMwKsveysrKwwdOhTt27dH/fr1ER0djePHj2Py5MkAgNOnT2PevHlwdHSEubk5ZDIZwsLC8Pz5c4wcObLUOExMTGBtbY3IyEhMmDCh1LLNmjXDwoUL31o3MS1ZsgR//PGH/JwVD4NISkrC1q1bFZrSxXD9+nXs3LkTRUVFyMzMxOXLlxEREQGJRILvv/9ePgwEAGbPno2wsDC4uLjA29sbBgYG2LNnDy5evIgvv/yyTD1A69evj549e2LDhg3Iz8+HiYkJzp8/j6ioKDRq1Ejwfrp27Yru3bvjww8/RKNGjXDjxg0EBwfDwcFBoQNX8XAOdXdkqm6YAKuwnTt3QkNDo9QH4Q4YMAABAQHyMV/79u2Dr68vtm3bhjp16mDIkCGYMmWKoF+348aNQ2JiIoKDg3Hy5El88MEHCA0NxaJFixTKeXh4IDQ0FEFBQXj+/DmaNWsGNzc3zJs3T/4F16VLFyxevBhbt26VX6lGRERAX18fw4cPR/PmzfH1119j06ZNyMnJgYGBATp37oxPPvlEfhwrKyvs2bMHixcvhr+/P/T19TFp0iTo6+tj+vTpgv6GNjY2iIqKQlBQEA4dOoQDBw4gPz8fTZs2RefOnTFnzhyFoSSlGTVqFLS0tLBu3TqsWLECGhoa6NmzJ7744gt5E7G2tjY8PT0RGRmJQ4cOoaCgAC1atMCKFSvk48zat2+Pjz/+GMeOHUNwcDDq1q0LS0tLbN++XemBA6qMHz8eS5cuRVZWlryDR2Vhbm6Ow4cPY9myZQgMDEReXh6sra3x+++/K/RiFktYWBjCwsJQu3Zt1KtXD2ZmZvDx8YGHh4dSB5vGjRvjyJEjWLZsGX7++We8ePECbdq0QVBQkMrHyr3N5s2b4evri59//hkFBQXo3r07wsPDyzTu1MfHB4cOHcLp06eRm5sLIyMjfPrpp/j0008Vyu3duxfdunVT2SmMSibJyMioWk/6pXKxcuVKBAQEsFt1FZSVlQUbGxvMnz8fXl5eFR0OqdmjR4/QsWNHbNmyRfQxvdUd7wGSSqmpqexRVkXp6upizpw58uY3qt4CAwPRvn17Jr93wCtAUpCQkICIiAj4+/vDwcEBP/74Y0WHRESkFrwCJAXnzp3DV199hf/+979YtWpVRYdDRKQ2vAIkIqIaiVeARERUIzEBEhFRjcQESERENRITIBER1UhMgEREVCMxARIRUY30fykigeI7YVudAAAAAElFTkSuQmCC\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# using matplotlib\n", "\n", "width = lower_limit[1] - lower_limit[0]\n", "\n", "plt.bar(lower_limit, bin_counts_norm['Adjusted Gross'], align='center', width=width, ec='white')\n", "\n", "plt.xlabel('Adjusted Gross (Million Dollars)')\n", "\n", "plt.ylabel('Percent per Million Dollars')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Vertical Axis: Density Scale\n", "\n", "The horizontal axis of a histogram is straightforward to read, once we have taken care of details like the ends of the bins. The features of the vertical axis require a little more attention. We will go over them one by one.\n", "\n", "Let's start by examining how to calculate the numbers on the vertical axis. If the calculation seems a little strange, have patience – the rest of the section will explain the reasoning.\n", "\n", "**Calculation.** The height of each bar is the **percent of elements that fall into the corresponding bin, relative to the width of the bin**. " ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
CountPercentHeight
(299.999, 400.0]8140.50.405
(400.0, 500.0]5226.00.260
(500.0, 600.0]2814.00.140
(600.0, 700.0]168.00.080
(700.0, 800.0]73.50.035
(800.0, 900.0]52.50.025
(900.0, 1000.0]31.50.015
(1100.0, 1200.0]31.50.015
(1200.0, 1300.0]21.00.010
(1000.0, 1100.0]10.50.005
(1500.0, 1600.0]10.50.005
(1700.0, 1800.0]10.50.005
(1800.0, 1900.0]00.00.000
(1300.0, 1400.0]00.00.000
(1400.0, 1500.0]00.00.000
(1600.0, 1700.0]00.00.000
(1900.0, 2000.0]00.00.000
\n", "
" ], "text/plain": [ " Count Percent Height\n", "(299.999, 400.0] 81 40.5 0.405\n", "(400.0, 500.0] 52 26.0 0.260\n", "(500.0, 600.0] 28 14.0 0.140\n", "(600.0, 700.0] 16 8.0 0.080\n", "(700.0, 800.0] 7 3.5 0.035\n", "(800.0, 900.0] 5 2.5 0.025\n", "(900.0, 1000.0] 3 1.5 0.015\n", "(1100.0, 1200.0] 3 1.5 0.015\n", "(1200.0, 1300.0] 2 1.0 0.010\n", "(1000.0, 1100.0] 1 0.5 0.005\n", "(1500.0, 1600.0] 1 0.5 0.005\n", "(1700.0, 1800.0] 1 0.5 0.005\n", "(1800.0, 1900.0] 0 0.0 0.000\n", "(1300.0, 1400.0] 0 0.0 0.000\n", "(1400.0, 1500.0] 0 0.0 0.000\n", "(1600.0, 1700.0] 0 0.0 0.000\n", "(1900.0, 2000.0] 0 0.0 0.000" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "counts = bin_counts\n", "\n", "counts = counts.rename(columns={'Adjusted Gross': 'Count'})\n", "\n", "percents = counts\n", "\n", "percents['Percent'] = (counts['Count']/200)*100\n", "\n", "percents\n", "\n", "heights = percents\n", "\n", "heights['Height'] = percents['Percent']/100\n", "\n", "heights" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Go over the numbers on the vertical axis of the histogram above to check that the column `Heights` looks correct." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The calculations will become clear if we just examine the first row of the table. \n", "\n", "Remember that there are 200 movies in the dataset. The [300, 400) bin contains 81 movies. That's 40.5% of all the movies:\n", "\n", "$$\n", "\\mbox{Percent} = \\frac{81}{200} \\cdot 100 = 40.5\n", "$$\n", "\n", "The width of the [300, 400) bin is $ 400 - 300 = 100$. So\n", "\n", "$$\n", "\\mbox{Height} = \\frac{40.5}{100} = 0.405\n", "$$\n", "\n", "The code for calculating the heights used the facts that there are 200 movies in all and that the width of each bin is 100." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Units.** The height of the bar is 40.5% divided by 100 million dollars, and so the height is 0.405% per million dollars. \n", "\n", "This method of drawing histograms creates a vertical axis that is said to be ***on the density scale***. The height of bar is **not** the percent of entries in the bin; it is the percent of entries in the bin relative to the amount of space in the bin. That is why the height measures crowdedness or *density*.\n", "\n", "Let's see why this matters." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Unequal Bins\n", "An advantage of the histogram over a bar chart is that a histogram can contain bins of unequal width. Below, the values in the `Millions` column are binned into three uneven categories." ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "uneven = np.array([300, 400, 600, 1500])\n", "\n", "fig, ax1 = plt.subplots()\n", "\n", "ax1.hist(millions['Adjusted Gross'], bins=uneven, density=True, ec='white')\n", "\n", "y_vals = ax1.get_yticks()\n", "\n", "y_label = 'Percent per ' + (unit if unit else 'unit')\n", "\n", "x_label = 'Adjusted Gross (' + (unit if unit else 'unit') + ')'\n", "\n", "ax1.set_yticklabels(['{:g}'.format(x * 100) for x in y_vals])\n", "\n", "plt.ylabel(y_label)\n", "\n", "plt.xlabel(x_label)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here are the counts in the three bins." ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Adjusted Gross
(299.999, 400.0]81
(400.0, 600.0]80
(600.0, 1500.0]37
\n", "
" ], "text/plain": [ " Adjusted Gross\n", "(299.999, 400.0] 81\n", "(400.0, 600.0] 80\n", "(600.0, 1500.0] 37" ] }, "execution_count": 35, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bin_counts_uneven = millions['Adjusted Gross']\n", "\n", "bin_counts_uneven = pd.DataFrame(bin_counts_uneven.value_counts(bins=uneven))\n", "\n", "bin_counts_uneven" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Although the ranges [300, 400) and [400, 600) have nearly identical counts, the bar over the former is twice as tall as the latter because it is only half as wide. The density of values in the [300, 400) is twice as much as the density in [400, 600). \n", "\n", "Histograms help us visualize where on the number line the data are most concentrated, especially when the bins are uneven." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Problem with Simply Plotting Counts\n", "It is possible to display counts directly in a chart, using the `normed=False` option of the `hist` method. The resulting chart has the same shape as a histogram when the bins all have equal widths, though the numbers on the vertical axis are different." ] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "millions.hist('Adjusted Gross', density=False, bins=np.arange(300,2001,100), ec='white')\n", "\n", "plt.xlabel('Adjusted Gross')\n", "\n", "plt.ylabel('Count')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While the count scale is perhaps more natural to interpret than the density scale, the chart becomes highly misleading when bins have different widths. Below, it appears (due to the count scale) that high-grossing movies are quite common, when in fact we have seen that they are relatively rare." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "millions.hist('Adjusted Gross', density=False, bins=uneven, ec='white')\n", "\n", "plt.xlabel('Adjusted Gross')\n", "\n", "plt.ylabel('Count')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Even though the method used is called `hist`, **the figure above is NOT A HISTOGRAM.** It misleadingly exaggerates the proportion of movies grossing at least 600 million dollars. The height of each bar is simply plotted at the number of movies in the bin, *without accounting for the difference in the widths of the bins*. \n", "\n", "The picture becomes even more absurd if the last two bins are combined." ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "very_uneven = np.array([300, 400, 1500])\n", "\n", "millions.hist('Adjusted Gross', density=False, bins=very_uneven, ec='white')\n", "\n", "plt.xlabel('Adjusted Gross')\n", "\n", "plt.ylabel('Count')\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this count-based figure, the shape of the distribution of movies is lost entirely." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The Histogram: General Principles and Calculation\n", "\n", "The figure above shows that what the eye perceives as \"big\" is area, not just height. This observation becomes particularly important when the bins have different widths.\n", "\n", "That is why a histogram has two defining properties:\n", "\n", "1. The bins are drawn to scale and are contiguous (though some might be empty), because the values on the horizontal axis are numerical.\n", "2. The **area** of each bar is proportional to the number of entries in the bin. \n", "\n", "Property 2 is the key to drawing a histogram, and is usually achieved as follows:\n", "\n", "$$\n", "\\mbox{area of bar} ~=~ \\mbox{percent of entries in bin}\n", "$$\n", "\n", "The calculation of the heights just uses the fact that the bar is a rectangle:\n", "\n", "$$\n", "\\mbox{area of bar} = \\mbox{height of bar} \\times \\mbox{width of bin}\n", "$$\n", "\n", "and so\n", "\n", "$$\n", "\\mbox{height of bar} ~=~ \n", "\\frac{\\mbox{area of bar}}{\\mbox{width of bin}} ~=~\n", "\\frac{\\mbox{percent of entries in bin}}{\\mbox{width of bin}}\n", "$$\n", "\n", "The units of height are \"percent per unit on the horizontal axis.\"\n", "\n", "When drawn using this method, the histogram is said to be drawn on the density scale. On this scale:\n", "- The area of each bar is equal to the percent of data values that are in the corresponding bin.\n", "- The total area of all the bars in the histogram is 100%. Speaking in terms of proportions, we say that the areas of all the bars in a histogram \"sum to 1\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Flat Tops and the Level of Detail\n", "\n", "Even though the density scale correctly represents percents using area, some detail is lost by grouping values into bins.\n", "\n", "Take another look at the [300, 400) bin in the figure below. The flat top of the bar, at the level 0.405% per million dollars, hides the fact that the movies are somewhat unevenly distributed across that bin. " ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "fig, ax1 = plt.subplots()\n", "\n", "ax1.hist(millions['Adjusted Gross'], bins=uneven, density=True, ec='white')\n", "\n", "y_vals = ax1.get_yticks()\n", "\n", "y_label = 'Percent per ' + (unit if unit else 'unit')\n", "\n", "x_label = 'Adjusted Gross (' + (unit if unit else 'unit') + ')'\n", "\n", "ax1.set_yticklabels(['{:g}'.format(x * 100) for x in y_vals])\n", "\n", "plt.ylabel(y_label)\n", "\n", "plt.xlabel(x_label)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To see this, let us split the [300, 400) bin into 10 narrower bins, each of width 10 million dollars." ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "some_tiny_bins = np.array([300, 310, 320, 330, 340, 350, 360, 370, 380, 390, 400, 600, 1500])\n", "\n", "fig, ax1 = plt.subplots()\n", "\n", "ax1.hist(millions['Adjusted Gross'], bins=some_tiny_bins, density=True, ec='white')\n", "\n", "y_vals = ax1.get_yticks()\n", "\n", "y_label = 'Percent per ' + (unit if unit else 'unit')\n", "\n", "x_label = 'Adjusted Gross (' + (unit if unit else 'unit') + ')'\n", "\n", "ax1.set_yticklabels(['{:g}'.format(x * 100) for x in y_vals])\n", "\n", "plt.ylabel(y_label)\n", "\n", "plt.xlabel(x_label)\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Some of the skinny bars are taller than 0.405 and others are shorter; the first two have heights of 0 because there are no data between 300 and 320. By putting a flat top at the level 0.405 across the whole bin, we are deciding to ignore the finer detail and are using the flat level as a rough approximation. Often, though not always, this is sufficient for understanding the general shape of the distribution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**The height as a rough approximation.**\n", "This observation gives us a different way of thinking about the height.\n", "Look again at the [300, 400) bin in the earlier histograms. As we have seen, the bin is 100 million dollars wide and contains 40.5% of the data. Therefore the height of the corresponding bar is 0.405% per million dollars.\n", "\n", "Now think of the bin as consisting of 100 narrow bins that are each 1 million dollars wide. The bar's height of \"0.405% per million dollars\" means that as a rough approximation, 0.405% of the movies are in each of those 100 skinny bins of width 1 million dollars." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Notice that because we have the entire dataset that is being used to draw the histograms, we can draw the histograms to as fine a level of detail as the data and our patience will allow. However, if you are looking at a histogram in a book or on a website, and you don't have access to the underlying dataset, then it becomes important to have a clear understanding of the \"rough approximation\" created by the flat tops." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Histograms Q&A\n", "Let's draw the histogram again, this time with four bins, and check our understanding of the concepts." ] }, { "cell_type": "code", "execution_count": 41, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "uneven_again = np.array([300, 350, 400, 450, 1500])\n", "\n", "fig, ax1 = plt.subplots()\n", "\n", "ax1.hist(millions['Adjusted Gross'], bins=uneven_again, density=True, ec='white')\n", "\n", "y_vals = ax1.get_yticks()\n", "\n", "y_label = 'Percent per ' + (unit if unit else 'unit')\n", "\n", "x_label = 'Adjusted Gross (' + (unit if unit else 'unit') + ')'\n", "\n", "ax1.set_yticklabels(['{:g}'.format(x * 100) for x in y_vals])\n", "\n", "plt.ylabel(y_label)\n", "\n", "plt.xlabel(x_label)\n", "\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Adjusted Gross
(450.0, 1500.0]92
(350.0, 400.0]49
(299.999, 350.0]32
(400.0, 450.0]25
\n", "
" ], "text/plain": [ " Adjusted Gross\n", "(450.0, 1500.0] 92\n", "(350.0, 400.0] 49\n", "(299.999, 350.0] 32\n", "(400.0, 450.0] 25" ] }, "execution_count": 42, "metadata": {}, "output_type": "execute_result" } ], "source": [ "bin_counts_uneven_again = millions['Adjusted Gross']\n", "\n", "bin_counts_uneven_again = pd.DataFrame(bin_counts_uneven_again.value_counts(bins=uneven_again))\n", "\n", "bin_counts_uneven_again" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Look again at the histogram, and compare the [400, 450) bin with the [450, 1500) bin.\n", "\n", "**Q**: Which has more movies in it? \n", "\n", "**A**: The [450, 1500) bin. It has 92 movies, compared with 25 movies in the [400, 450) bin.\n", "\n", "**Q**: Then why is the [450, 1500) bar so much shorter than the [400, 450) bar?\n", "\n", "**A**: Because height represents density per unit of space in the bin, not the number of movies in the bin. The [450, 1500) bin does have more movies than the [400, 450) bin, but it is also a whole lot wider. So it is less crowded. The density of movies in it is much lower." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Differences Between Bar Charts and Histograms\n", "\n", "- Bar charts display one quantity per category. They are often used to display the distributions of categorical variables. Histograms display the distributions of quantitative variables. \n", "- All the bars in a bar chart have the same width, and there is an equal amount of space between consecutive bars. The bars of a histogram can have different widths, and they are contiguous.\n", "- The lengths (or heights, if the bars are drawn vertically) of the bars in a bar chart are proportional to the value for each category. The heights of bars in a histogram measure densities; the *areas* of bars in a histogram are proportional to the numbers of entries in the bins." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 1 }