{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": [
     "remove_input"
    ]
   },
   "outputs": [],
   "source": [
    "path_data = '../../../data/'\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "plt.style.use('fivethirtyeight')\n",
    "%matplotlib inline\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sampling and Empirical Distributions ###\n",
    "An important part of data science consists of making conclusions based on the data in random samples. In order to correctly interpret their results, data scientists have to first understand exactly what random samples are.\n",
    "\n",
    "In this chapter we will take a more careful look at sampling, with special attention to the properties of large random samples. \n",
    "\n",
    "Let's start by drawing some samples. Our examples are based on the <code><a href=\"imdb.csv\">top_movies.csv</a></code> data set."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Title</th>\n",
       "      <th>Studio</th>\n",
       "      <th>Gross</th>\n",
       "      <th>Gross (Adjusted)</th>\n",
       "      <th>Year</th>\n",
       "      <th>Row Index</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Star Wars: The Force Awakens</td>\n",
       "      <td>Buena Vista (Disney)</td>\n",
       "      <td>906723418</td>\n",
       "      <td>906723400</td>\n",
       "      <td>2015</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Avatar</td>\n",
       "      <td>Fox</td>\n",
       "      <td>760507625</td>\n",
       "      <td>846120800</td>\n",
       "      <td>2009</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Titanic</td>\n",
       "      <td>Paramount</td>\n",
       "      <td>658672302</td>\n",
       "      <td>1178627900</td>\n",
       "      <td>1997</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Jurassic World</td>\n",
       "      <td>Universal</td>\n",
       "      <td>652270625</td>\n",
       "      <td>687728000</td>\n",
       "      <td>2015</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Marvel's The Avengers</td>\n",
       "      <td>Buena Vista (Disney)</td>\n",
       "      <td>623357910</td>\n",
       "      <td>668866600</td>\n",
       "      <td>2012</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                          Title                Studio      Gross  \\\n",
       "0  Star Wars: The Force Awakens  Buena Vista (Disney)  906723418   \n",
       "1                        Avatar                   Fox  760507625   \n",
       "2                       Titanic             Paramount  658672302   \n",
       "3                Jurassic World             Universal  652270625   \n",
       "4         Marvel's The Avengers  Buena Vista (Disney)  623357910   \n",
       "\n",
       "   Gross (Adjusted)  Year  Row Index  \n",
       "0         906723400  2015          0  \n",
       "1         846120800  2009          1  \n",
       "2        1178627900  1997          2  \n",
       "3         687728000  2015          3  \n",
       "4         668866600  2012          4  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top_raw = pd.read_csv(path_data + 'top_movies.csv')\n",
    "\n",
    "top1 = top_raw.copy()\n",
    "\n",
    "top1['Row Index'] = np.arange(len(top1))\n",
    "\n",
    "top1.head(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Column Position ###\n",
    "Notice that column we have created 'Row Index' is positioned last in the df, to make life easier we would like this column to be first in the df. There are several ways in which we can move the position of this column e.g. we could [`pop`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pop.html) the column out of the df then re-insert it in the desired position or we could [`drop`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html) the column then re-insert into the df. Yet another method would be `insert` the column ['Row Index'](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.insert.html) at the desired df position.\n",
    "\n",
    "[Pandas 'pop'](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pop.html)\n",
    "\n",
    "[Pandas 'drop'](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html)\n",
    "\n",
    "[Pandas 'insert'](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.insert.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Insert ###"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Row Index</th>\n",
       "      <th>Title</th>\n",
       "      <th>Studio</th>\n",
       "      <th>Gross</th>\n",
       "      <th>Gross (Adjusted)</th>\n",
       "      <th>Year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>Star Wars: The Force Awakens</td>\n",
       "      <td>Buena Vista (Disney)</td>\n",
       "      <td>906723418</td>\n",
       "      <td>906723400</td>\n",
       "      <td>2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1</td>\n",
       "      <td>Avatar</td>\n",
       "      <td>Fox</td>\n",
       "      <td>760507625</td>\n",
       "      <td>846120800</td>\n",
       "      <td>2009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>2</td>\n",
       "      <td>Titanic</td>\n",
       "      <td>Paramount</td>\n",
       "      <td>658672302</td>\n",
       "      <td>1178627900</td>\n",
       "      <td>1997</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "      <td>Jurassic World</td>\n",
       "      <td>Universal</td>\n",
       "      <td>652270625</td>\n",
       "      <td>687728000</td>\n",
       "      <td>2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4</td>\n",
       "      <td>Marvel's The Avengers</td>\n",
       "      <td>Buena Vista (Disney)</td>\n",
       "      <td>623357910</td>\n",
       "      <td>668866600</td>\n",
       "      <td>2012</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>195</th>\n",
       "      <td>195</td>\n",
       "      <td>The Caine Mutiny</td>\n",
       "      <td>Columbia</td>\n",
       "      <td>21750000</td>\n",
       "      <td>386173500</td>\n",
       "      <td>1954</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196</th>\n",
       "      <td>196</td>\n",
       "      <td>The Bells of St. Mary's</td>\n",
       "      <td>RKO</td>\n",
       "      <td>21333333</td>\n",
       "      <td>545882400</td>\n",
       "      <td>1945</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>197</th>\n",
       "      <td>197</td>\n",
       "      <td>Duel in the Sun</td>\n",
       "      <td>Selz.</td>\n",
       "      <td>20408163</td>\n",
       "      <td>443877500</td>\n",
       "      <td>1946</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>198</th>\n",
       "      <td>198</td>\n",
       "      <td>Sergeant York</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>16361885</td>\n",
       "      <td>418671800</td>\n",
       "      <td>1941</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>199</th>\n",
       "      <td>199</td>\n",
       "      <td>The Four Horsemen of the Apocalypse</td>\n",
       "      <td>MPC</td>\n",
       "      <td>9183673</td>\n",
       "      <td>399489800</td>\n",
       "      <td>1921</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>200 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     Row Index                                Title                Studio  \\\n",
       "0            0         Star Wars: The Force Awakens  Buena Vista (Disney)   \n",
       "1            1                               Avatar                   Fox   \n",
       "2            2                              Titanic             Paramount   \n",
       "3            3                       Jurassic World             Universal   \n",
       "4            4                Marvel's The Avengers  Buena Vista (Disney)   \n",
       "..         ...                                  ...                   ...   \n",
       "195        195                     The Caine Mutiny              Columbia   \n",
       "196        196              The Bells of St. Mary's                   RKO   \n",
       "197        197                      Duel in the Sun                 Selz.   \n",
       "198        198                        Sergeant York          Warner Bros.   \n",
       "199        199  The Four Horsemen of the Apocalypse                   MPC   \n",
       "\n",
       "         Gross  Gross (Adjusted)  Year  \n",
       "0    906723418         906723400  2015  \n",
       "1    760507625         846120800  2009  \n",
       "2    658672302        1178627900  1997  \n",
       "3    652270625         687728000  2015  \n",
       "4    623357910         668866600  2012  \n",
       "..         ...               ...   ...  \n",
       "195   21750000         386173500  1954  \n",
       "196   21333333         545882400  1945  \n",
       "197   20408163         443877500  1946  \n",
       "198   16361885         418671800  1941  \n",
       "199    9183673         399489800  1921  \n",
       "\n",
       "[200 rows x 6 columns]"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top2 = top1.drop(columns=['Row Index'])\n",
    "\n",
    "top2.insert(0, 'Row Index', np.arange(len(top2)))\n",
    "\n",
    "top2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Rename Index ###\n",
    "Rather than creating a new column we can rename the existing axis, howeverby doing this we must remember that 'Row Index' is the actual df index and not simple 'column'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Row Index</th>\n",
       "      <th>Title</th>\n",
       "      <th>Studio</th>\n",
       "      <th>Gross</th>\n",
       "      <th>Gross (Adjusted)</th>\n",
       "      <th>Year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Star Wars: The Force Awakens</td>\n",
       "      <td>Buena Vista (Disney)</td>\n",
       "      <td>906723418</td>\n",
       "      <td>906723400</td>\n",
       "      <td>2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Avatar</td>\n",
       "      <td>Fox</td>\n",
       "      <td>760507625</td>\n",
       "      <td>846120800</td>\n",
       "      <td>2009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Titanic</td>\n",
       "      <td>Paramount</td>\n",
       "      <td>658672302</td>\n",
       "      <td>1178627900</td>\n",
       "      <td>1997</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Jurassic World</td>\n",
       "      <td>Universal</td>\n",
       "      <td>652270625</td>\n",
       "      <td>687728000</td>\n",
       "      <td>2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Marvel's The Avengers</td>\n",
       "      <td>Buena Vista (Disney)</td>\n",
       "      <td>623357910</td>\n",
       "      <td>668866600</td>\n",
       "      <td>2012</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>195</th>\n",
       "      <td>The Caine Mutiny</td>\n",
       "      <td>Columbia</td>\n",
       "      <td>21750000</td>\n",
       "      <td>386173500</td>\n",
       "      <td>1954</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196</th>\n",
       "      <td>The Bells of St. Mary's</td>\n",
       "      <td>RKO</td>\n",
       "      <td>21333333</td>\n",
       "      <td>545882400</td>\n",
       "      <td>1945</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>197</th>\n",
       "      <td>Duel in the Sun</td>\n",
       "      <td>Selz.</td>\n",
       "      <td>20408163</td>\n",
       "      <td>443877500</td>\n",
       "      <td>1946</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>198</th>\n",
       "      <td>Sergeant York</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>16361885</td>\n",
       "      <td>418671800</td>\n",
       "      <td>1941</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>199</th>\n",
       "      <td>The Four Horsemen of the Apocalypse</td>\n",
       "      <td>MPC</td>\n",
       "      <td>9183673</td>\n",
       "      <td>399489800</td>\n",
       "      <td>1921</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>200 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "Row Index                                Title                Studio  \\\n",
       "0                 Star Wars: The Force Awakens  Buena Vista (Disney)   \n",
       "1                                       Avatar                   Fox   \n",
       "2                                      Titanic             Paramount   \n",
       "3                               Jurassic World             Universal   \n",
       "4                        Marvel's The Avengers  Buena Vista (Disney)   \n",
       "..                                         ...                   ...   \n",
       "195                           The Caine Mutiny              Columbia   \n",
       "196                    The Bells of St. Mary's                   RKO   \n",
       "197                            Duel in the Sun                 Selz.   \n",
       "198                              Sergeant York          Warner Bros.   \n",
       "199        The Four Horsemen of the Apocalypse                   MPC   \n",
       "\n",
       "Row Index      Gross  Gross (Adjusted)  Year  \n",
       "0          906723418         906723400  2015  \n",
       "1          760507625         846120800  2009  \n",
       "2          658672302        1178627900  1997  \n",
       "3          652270625         687728000  2015  \n",
       "4          623357910         668866600  2012  \n",
       "..               ...               ...   ...  \n",
       "195         21750000         386173500  1954  \n",
       "196         21333333         545882400  1945  \n",
       "197         20408163         443877500  1946  \n",
       "198         16361885         418671800  1941  \n",
       "199          9183673         399489800  1921  \n",
       "\n",
       "[200 rows x 5 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top = top1.drop(columns=['Row Index'])\n",
    "\n",
    "top = top.rename_axis('Row Index', axis='columns')\n",
    "\n",
    "top"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Number Formatting ###\n",
    "Before going on to process the data we may wish to adjust the format of data (as we have previously). To achieve this we can employ Pandas 'Display Values' which allows us to `format` an entire df or to specific columns.\n",
    "\n",
    "[Pandas format](https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html#Finer-Control:-Display-Values)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<style  type=\"text/css\" >\n",
       "</style><table id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23\" ><thead>    <tr>        <th class=\"index_name level0\" >Row Index</th>        <th class=\"col_heading level0 col0\" >Title</th>        <th class=\"col_heading level0 col1\" >Studio</th>        <th class=\"col_heading level0 col2\" >Gross</th>        <th class=\"col_heading level0 col3\" >Gross (Adjusted)</th>        <th class=\"col_heading level0 col4\" >Year</th>    </tr></thead><tbody>\n",
       "                <tr>\n",
       "                        <th id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23level0_row0\" class=\"row_heading level0 row0\" >0</th>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row0_col0\" class=\"data row0 col0\" >Star Wars: The Force Awakens</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row0_col1\" class=\"data row0 col1\" >Buena Vista (Disney)</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row0_col2\" class=\"data row0 col2\" >906,723,418</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row0_col3\" class=\"data row0 col3\" >906,723,400</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row0_col4\" class=\"data row0 col4\" >2015</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                        <th id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23level0_row1\" class=\"row_heading level0 row1\" >1</th>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row1_col0\" class=\"data row1 col0\" >Avatar</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row1_col1\" class=\"data row1 col1\" >Fox</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row1_col2\" class=\"data row1 col2\" >760,507,625</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row1_col3\" class=\"data row1 col3\" >846,120,800</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row1_col4\" class=\"data row1 col4\" >2009</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                        <th id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23level0_row2\" class=\"row_heading level0 row2\" >2</th>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row2_col0\" class=\"data row2 col0\" >Titanic</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row2_col1\" class=\"data row2 col1\" >Paramount</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row2_col2\" class=\"data row2 col2\" >658,672,302</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row2_col3\" class=\"data row2 col3\" >1,178,627,900</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row2_col4\" class=\"data row2 col4\" >1997</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                        <th id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23level0_row3\" class=\"row_heading level0 row3\" >3</th>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row3_col0\" class=\"data row3 col0\" >Jurassic World</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row3_col1\" class=\"data row3 col1\" >Universal</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row3_col2\" class=\"data row3 col2\" >652,270,625</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row3_col3\" class=\"data row3 col3\" >687,728,000</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row3_col4\" class=\"data row3 col4\" >2015</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                        <th id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23level0_row4\" class=\"row_heading level0 row4\" >4</th>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row4_col0\" class=\"data row4 col0\" >Marvel's The Avengers</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row4_col1\" class=\"data row4 col1\" >Buena Vista (Disney)</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row4_col2\" class=\"data row4 col2\" >623,357,910</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row4_col3\" class=\"data row4 col3\" >668,866,600</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row4_col4\" class=\"data row4 col4\" >2012</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                        <th id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23level0_row5\" class=\"row_heading level0 row5\" >5</th>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row5_col0\" class=\"data row5 col0\" >The Dark Knight</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row5_col1\" class=\"data row5 col1\" >Warner Bros.</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row5_col2\" class=\"data row5 col2\" >534,858,444</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row5_col3\" class=\"data row5 col3\" >647,761,600</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row5_col4\" class=\"data row5 col4\" >2008</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                        <th id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23level0_row6\" class=\"row_heading level0 row6\" >6</th>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row6_col0\" class=\"data row6 col0\" >Star Wars: Episode I - The Phantom Menace</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row6_col1\" class=\"data row6 col1\" >Fox</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row6_col2\" class=\"data row6 col2\" >474,544,677</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row6_col3\" class=\"data row6 col3\" >785,715,000</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row6_col4\" class=\"data row6 col4\" >1999</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                        <th id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23level0_row7\" class=\"row_heading level0 row7\" >7</th>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row7_col0\" class=\"data row7 col0\" >Star Wars</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row7_col1\" class=\"data row7 col1\" >Fox</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row7_col2\" class=\"data row7 col2\" >460,998,007</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row7_col3\" class=\"data row7 col3\" >1,549,640,500</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row7_col4\" class=\"data row7 col4\" >1977</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                        <th id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23level0_row8\" class=\"row_heading level0 row8\" >8</th>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row8_col0\" class=\"data row8 col0\" >Avengers: Age of Ultron</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row8_col1\" class=\"data row8 col1\" >Buena Vista (Disney)</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row8_col2\" class=\"data row8 col2\" >459,005,868</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row8_col3\" class=\"data row8 col3\" >465,684,200</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row8_col4\" class=\"data row8 col4\" >2015</td>\n",
       "            </tr>\n",
       "            <tr>\n",
       "                        <th id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23level0_row9\" class=\"row_heading level0 row9\" >9</th>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row9_col0\" class=\"data row9 col0\" >The Dark Knight Rises</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row9_col1\" class=\"data row9 col1\" >Warner Bros.</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row9_col2\" class=\"data row9 col2\" >448,139,099</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row9_col3\" class=\"data row9 col3\" >500,961,700</td>\n",
       "                        <td id=\"T_046dbd24_510a_11eb_91c1_685b35b96a23row9_col4\" class=\"data row9 col4\" >2012</td>\n",
       "            </tr>\n",
       "    </tbody></table>"
      ],
      "text/plain": [
       "<pandas.io.formats.style.Styler at 0x7fce4e243b38>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top.head(10).style.format({'Gross': \"{:,}\", 'Gross (Adjusted)': '{:,}'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sampling Rows of a Table ###\n",
    "Each row of a data table represents an individual; in `top`, each individual is a movie. Sampling individuals can thus be achieved by sampling the rows of a table.\n",
    "\n",
    "The contents of a row are the values of different variables measured on the same individual. So the contents of the sampled rows form samples of values of each of the variables."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Deterministic Samples ###\n",
    "\n",
    "When you simply specify which elements of a set you want to choose, without any chances involved, you create a ***deterministic*** *sample*.\n",
    "\n",
    "You have done this many times, for example by using [`df.iloc`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html) (**i**ndex **loc**ation and the df index values `[ ]`:\n",
    "\n",
    "[Pandas iloc](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Row Index</th>\n",
       "      <th>Title</th>\n",
       "      <th>Studio</th>\n",
       "      <th>Gross</th>\n",
       "      <th>Gross (Adjusted)</th>\n",
       "      <th>Year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Jurassic World</td>\n",
       "      <td>Universal</td>\n",
       "      <td>652270625</td>\n",
       "      <td>687728000</td>\n",
       "      <td>2015</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Spider-Man</td>\n",
       "      <td>Sony</td>\n",
       "      <td>403706375</td>\n",
       "      <td>604517300</td>\n",
       "      <td>2002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>100</th>\n",
       "      <td>Gone with the Wind</td>\n",
       "      <td>MGM</td>\n",
       "      <td>198676459</td>\n",
       "      <td>1757788200</td>\n",
       "      <td>1939</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Row Index               Title     Studio      Gross  Gross (Adjusted)  Year\n",
       "3              Jurassic World  Universal  652270625         687728000  2015\n",
       "18                 Spider-Man       Sony  403706375         604517300  2002\n",
       "100        Gone with the Wind        MGM  198676459        1757788200  1939"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top.iloc[np.array([3, 18, 100])]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also use Pandas `contains` as a conditional operator:\n",
    "\n",
    "[Pandas where](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.contains.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Row Index</th>\n",
       "      <th>Title</th>\n",
       "      <th>Studio</th>\n",
       "      <th>Gross</th>\n",
       "      <th>Gross (Adjusted)</th>\n",
       "      <th>Year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Harry Potter and the Deathly Hallows Part 2</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>381011219</td>\n",
       "      <td>417512200</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>Harry Potter and the Sorcerer's Stone</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>317575550</td>\n",
       "      <td>486442900</td>\n",
       "      <td>2001</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54</th>\n",
       "      <td>Harry Potter and the Half-Blood Prince</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>301959197</td>\n",
       "      <td>352098800</td>\n",
       "      <td>2009</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>59</th>\n",
       "      <td>Harry Potter and the Order of the Phoenix</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>292004738</td>\n",
       "      <td>369250200</td>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62</th>\n",
       "      <td>Harry Potter and the Goblet of Fire</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>290013036</td>\n",
       "      <td>393024800</td>\n",
       "      <td>2005</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>69</th>\n",
       "      <td>Harry Potter and the Chamber of Secrets</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>261988482</td>\n",
       "      <td>390768100</td>\n",
       "      <td>2002</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>76</th>\n",
       "      <td>Harry Potter and the Prisoner of Azkaban</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>249541069</td>\n",
       "      <td>349598600</td>\n",
       "      <td>2004</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Row Index                                        Title        Studio  \\\n",
       "22         Harry Potter and the Deathly Hallows Part 2  Warner Bros.   \n",
       "43               Harry Potter and the Sorcerer's Stone  Warner Bros.   \n",
       "54              Harry Potter and the Half-Blood Prince  Warner Bros.   \n",
       "59           Harry Potter and the Order of the Phoenix  Warner Bros.   \n",
       "62                 Harry Potter and the Goblet of Fire  Warner Bros.   \n",
       "69             Harry Potter and the Chamber of Secrets  Warner Bros.   \n",
       "76            Harry Potter and the Prisoner of Azkaban  Warner Bros.   \n",
       "\n",
       "Row Index      Gross  Gross (Adjusted)  Year  \n",
       "22         381011219         417512200  2011  \n",
       "43         317575550         486442900  2001  \n",
       "54         301959197         352098800  2009  \n",
       "59         292004738         369250200  2007  \n",
       "62         290013036         393024800  2005  \n",
       "69         261988482         390768100  2002  \n",
       "76         249541069         349598600  2004  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "top[top['Title'].str.contains('Harry Potter')]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "While these are samples, they are not random samples. They don't involve chance."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Probability Samples\n",
    "------------------"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For describing random samples, some terminology will be helpful.\n",
    "\n",
    "A *population* is the set of all elements from whom a sample will be drawn.\n",
    "\n",
    "A *probability sample* is one for which it is possible to calculate, before the sample is drawn, the chance with which any subset of elements will enter the sample.\n",
    "\n",
    "In a probability sample, all elements need not have the same chance of being chosen. \n",
    "\n",
    "### A Random Sampling Scheme ###\n",
    "\n",
    "For example, suppose you choose two people from a population that consists of three people A, B, and C, according to the following scheme:\n",
    "\n",
    "- Person A is chosen with probability 1.\n",
    "- One of Persons B or C is chosen according to the toss of a coin: if the coin lands heads, you choose B, and if it lands tails you choose C.\n",
    "\n",
    "This is a probability sample of size 2. Here are the chances of entry for all non-empty subsets:\n",
    "\n",
    "    A: 1 \n",
    "    B: 1/2\n",
    "    C: 1/2\n",
    "    AB: 1/2\n",
    "    AC: 1/2\n",
    "    BC: 0\n",
    "    ABC: 0\n",
    "\n",
    "Person A has a higher chance of being selected than Persons B or C; indeed, Person A is certain to be selected. Since these differences are known and quantified, they can be taken into account when working with the sample. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A Systematic Sample ###\n",
    "\n",
    "Imagine all the elements of the population listed in a sequence. One method of sampling starts by choosing a random position early in the list, and then evenly spaced positions after that. The sample consists of the elements in those positions. Such a sample is called a *systematic sample*. \n",
    "\n",
    "Here we will choose a systematic sample of the rows of `top`. We will start by picking one of the first 10 rows at random, and then we will pick applying the `take` method every 10th row after that.  \n",
    "\n",
    "[Pandas take](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.take.html)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Row Index</th>\n",
       "      <th>Title</th>\n",
       "      <th>Studio</th>\n",
       "      <th>Gross</th>\n",
       "      <th>Gross (Adjusted)</th>\n",
       "      <th>Year</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Titanic</td>\n",
       "      <td>Paramount</td>\n",
       "      <td>658672302</td>\n",
       "      <td>1178627900</td>\n",
       "      <td>1997</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>The Hunger Games: Catching Fire</td>\n",
       "      <td>Lionsgate</td>\n",
       "      <td>424668047</td>\n",
       "      <td>444697400</td>\n",
       "      <td>2013</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Harry Potter and the Deathly Hallows Part 2</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>381011219</td>\n",
       "      <td>417512200</td>\n",
       "      <td>2011</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>American Sniper</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>350126372</td>\n",
       "      <td>374796000</td>\n",
       "      <td>2014</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>Iron Man</td>\n",
       "      <td>Paramount</td>\n",
       "      <td>318412101</td>\n",
       "      <td>385808100</td>\n",
       "      <td>2008</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>52</th>\n",
       "      <td>Skyfall</td>\n",
       "      <td>Sony</td>\n",
       "      <td>304360277</td>\n",
       "      <td>329225400</td>\n",
       "      <td>2012</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62</th>\n",
       "      <td>Harry Potter and the Goblet of Fire</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>290013036</td>\n",
       "      <td>393024800</td>\n",
       "      <td>2005</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>72</th>\n",
       "      <td>Jaws</td>\n",
       "      <td>Universal</td>\n",
       "      <td>260000000</td>\n",
       "      <td>1114285700</td>\n",
       "      <td>1975</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>82</th>\n",
       "      <td>Twister</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>241721524</td>\n",
       "      <td>475786700</td>\n",
       "      <td>1996</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>92</th>\n",
       "      <td>Ghost</td>\n",
       "      <td>Paramount</td>\n",
       "      <td>217631306</td>\n",
       "      <td>447747400</td>\n",
       "      <td>1990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>102</th>\n",
       "      <td>Toy Story</td>\n",
       "      <td>Buena Vista (Disney)</td>\n",
       "      <td>191796233</td>\n",
       "      <td>381654400</td>\n",
       "      <td>1995</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>112</th>\n",
       "      <td>Pretty Woman</td>\n",
       "      <td>Buena Vista (Disney)</td>\n",
       "      <td>178406268</td>\n",
       "      <td>366934900</td>\n",
       "      <td>1990</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>122</th>\n",
       "      <td>Batman Returns</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>162831698</td>\n",
       "      <td>341358000</td>\n",
       "      <td>1992</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>132</th>\n",
       "      <td>101 Dalmatians</td>\n",
       "      <td>Disney</td>\n",
       "      <td>144880014</td>\n",
       "      <td>869280100</td>\n",
       "      <td>1961</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>142</th>\n",
       "      <td>On Golden Pond</td>\n",
       "      <td>Universal</td>\n",
       "      <td>119285432</td>\n",
       "      <td>353083700</td>\n",
       "      <td>1981</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>152</th>\n",
       "      <td>Kramer Vs. Kramer</td>\n",
       "      <td>Columbia</td>\n",
       "      <td>106260000</td>\n",
       "      <td>374276100</td>\n",
       "      <td>1979</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>162</th>\n",
       "      <td>Cinderella (1950)</td>\n",
       "      <td>Disney</td>\n",
       "      <td>93141149</td>\n",
       "      <td>547050200</td>\n",
       "      <td>1950</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>172</th>\n",
       "      <td>My Fair Lady</td>\n",
       "      <td>Warner Bros.</td>\n",
       "      <td>72000000</td>\n",
       "      <td>522000000</td>\n",
       "      <td>1964</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>182</th>\n",
       "      <td>Goldfinger</td>\n",
       "      <td>UA</td>\n",
       "      <td>51081062</td>\n",
       "      <td>576810000</td>\n",
       "      <td>1964</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192</th>\n",
       "      <td>The Bridge on the River Kwai</td>\n",
       "      <td>Columbia</td>\n",
       "      <td>27200000</td>\n",
       "      <td>473280000</td>\n",
       "      <td>1957</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Row Index                                        Title                Studio  \\\n",
       "2                                              Titanic             Paramount   \n",
       "12                     The Hunger Games: Catching Fire             Lionsgate   \n",
       "22         Harry Potter and the Deathly Hallows Part 2          Warner Bros.   \n",
       "32                                     American Sniper          Warner Bros.   \n",
       "42                                            Iron Man             Paramount   \n",
       "52                                             Skyfall                  Sony   \n",
       "62                 Harry Potter and the Goblet of Fire          Warner Bros.   \n",
       "72                                                Jaws             Universal   \n",
       "82                                             Twister          Warner Bros.   \n",
       "92                                               Ghost             Paramount   \n",
       "102                                          Toy Story  Buena Vista (Disney)   \n",
       "112                                       Pretty Woman  Buena Vista (Disney)   \n",
       "122                                     Batman Returns          Warner Bros.   \n",
       "132                                     101 Dalmatians                Disney   \n",
       "142                                     On Golden Pond             Universal   \n",
       "152                                  Kramer Vs. Kramer              Columbia   \n",
       "162                                  Cinderella (1950)                Disney   \n",
       "172                                       My Fair Lady          Warner Bros.   \n",
       "182                                         Goldfinger                    UA   \n",
       "192                       The Bridge on the River Kwai              Columbia   \n",
       "\n",
       "Row Index      Gross  Gross (Adjusted)  Year  \n",
       "2          658672302        1178627900  1997  \n",
       "12         424668047         444697400  2013  \n",
       "22         381011219         417512200  2011  \n",
       "32         350126372         374796000  2014  \n",
       "42         318412101         385808100  2008  \n",
       "52         304360277         329225400  2012  \n",
       "62         290013036         393024800  2005  \n",
       "72         260000000        1114285700  1975  \n",
       "82         241721524         475786700  1996  \n",
       "92         217631306         447747400  1990  \n",
       "102        191796233         381654400  1995  \n",
       "112        178406268         366934900  1990  \n",
       "122        162831698         341358000  1992  \n",
       "132        144880014         869280100  1961  \n",
       "142        119285432         353083700  1981  \n",
       "152        106260000         374276100  1979  \n",
       "162         93141149         547050200  1950  \n",
       "172         72000000         522000000  1964  \n",
       "182         51081062         576810000  1964  \n",
       "192         27200000         473280000  1957  "
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"\"\"Choose a random start among rows 0 through 9;\n",
    "then take every 10th row.\"\"\"\n",
    "\n",
    "start = np.random.choice(np.arange(10))\n",
    "top.take(np.arange(start, len(top), 10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the cell a few times to see how the output varies. \n",
    "\n",
    "This systematic sample is a probability sample. In this scheme, all rows have chance $1/10$ of being chosen. For example, Row 23 is chosen if and only if Row 3 is chosen, and the chance of that is $1/10$. \n",
    "\n",
    "But not all subsets have the same chance of being chosen. Because the selected rows are evenly spaced, most subsets of rows have no chance of being chosen. The only subsets that are possible are those that consist of rows all separated by multiples of 10. Any of those subsets is selected with chance 1/10.  Other subsets, like the subset containing the first 11 rows of the table, are selected with chance 0."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Random Samples Drawn With or Without Replacement ###\n",
    "In this course, we will mostly deal with the two most straightforward methods of sampling. \n",
    "\n",
    "The first is random sampling with replacement, which (as we have seen earlier) is the default behavior of `np.random.choice` when it samples from an array. \n",
    "\n",
    "The other, called a \"simple random sample\", is a sample drawn at random *without* replacement. Sampled individuals are not replaced in the population before the next individual is drawn. This is the kind of sampling that happens when you deal a hand from a deck of cards, for example. \n",
    "\n",
    "In this chapter, we will use simulation to study the behavior of large samples drawn at random with or without replacement.  \n",
    "\n",
    "[Numpy random.choice](https://het.as.utexas.edu/HET/Software/Numpy/reference/generated/numpy.random.choice.html)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "Drawing a random sample requires care and precision. It is not haphazard, even though that is a colloquial meaning of the word \"random\". If you stand at a street corner and take as your sample the first ten people who pass by, you might think you're sampling at random because you didn't choose who walked by. But it's not a random sample – it's a *sample of convenience*. You didn't know ahead of time the probability of each person entering the sample; perhaps you hadn't even specified exactly who was in the population."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}