{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false,
    "tags": [
     "remove_input"
    ]
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "np.set_printoptions(threshold=50)\n",
    "path_data = '../../../data/'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tables\n",
    "\n",
    "Tables are a fundamental object type for representing data sets. A table can be viewed in two ways:\n",
    "* a sequence of named columns that each describe a single aspect of all entries in a data set, or\n",
    "* a sequence of rows that each contain all information about a single entry in a data set.\n",
    "\n",
    "In order to use tables, import all of the module called `datascience`, a module created for this text."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [
    {
     "ename": "ModuleNotFoundError",
     "evalue": "No module named 'datascience'",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-2-6f3a305c96af>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32mfrom\u001b[0m \u001b[0mdatascience\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0;34m*\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mModuleNotFoundError\u001b[0m: No module named 'datascience'"
     ]
    }
   ],
   "source": [
    "from datascience import *"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Empty tables can be created using the `Table` function. An empty table is usefuly because it can be extended to contain new rows and columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": []
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Table()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `with_columns` method on a table constructs a new table with additional labeled columns. Each column of a table is an array. To add one new column to a table, call `with_columns` with a label and an array. (The `with_column` method can be used with the same effect.)\n",
    "\n",
    "Below, we begin each example with an empty table that has no columns. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Number of petals</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>8               </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34              </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>5               </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Number of petals\n",
       "8\n",
       "34\n",
       "5"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Table().with_columns('Number of petals', make_array(8, 34, 5))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To add two (or more) new columns, provide the label and array for each column. All columns must have the same length, or an error will occur."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Number of petals</th> <th>Name</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>8               </td> <td>lotus    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34              </td> <td>sunflower</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>5               </td> <td>rose     </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Number of petals | Name\n",
       "8                | lotus\n",
       "34               | sunflower\n",
       "5                | rose"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "Table().with_columns(\n",
    "    'Number of petals', make_array(8, 34, 5),\n",
    "    'Name', make_array('lotus', 'sunflower', 'rose')\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can give this table a name, and then extend the table with another column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Number of petals</th> <th>Name</th> <th>Color</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>8               </td> <td>lotus    </td> <td>pink  </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34              </td> <td>sunflower</td> <td>yellow</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>5               </td> <td>rose     </td> <td>red   </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Number of petals | Name      | Color\n",
       "8                | lotus     | pink\n",
       "34               | sunflower | yellow\n",
       "5                | rose      | red"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "flowers = Table().with_columns(\n",
    "    'Number of petals', make_array(8, 34, 5),\n",
    "    'Name', make_array('lotus', 'sunflower', 'rose')\n",
    ")\n",
    "\n",
    "flowers.with_columns(\n",
    "    'Color', make_array('pink', 'yellow', 'red')\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `with_columns` method creates a new table each time it is called, so the original table is not affected. For example, the table `flowers` still has only the two columns that it had when it was created."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Number of petals</th> <th>Name</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>8               </td> <td>lotus    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34              </td> <td>sunflower</td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>5               </td> <td>rose     </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Number of petals | Name\n",
       "8                | lotus\n",
       "34               | sunflower\n",
       "5                | rose"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "flowers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Creating tables in this way involves a lot of typing. If the data have already been entered somewhere, it is usually possible to use Python to read it into a table, instead of typing it all in cell by cell.\n",
    "\n",
    "Often, tables are created from files that contain comma-separated values. Such files are called CSV files.\n",
    "\n",
    "Below, we use the Table method `read_table` to read a CSV file that contains some of the data used by Minard in his graphic about Napoleon's Russian campaign. The data are placed in a table named `minard`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Longitude</th> <th>Latitude</th> <th>City</th> <th>Direction</th> <th>Survivors</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.8    </td> <td>Smolensk   </td> <td>Advance  </td> <td>145000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>33.2     </td> <td>54.9    </td> <td>Dorogobouge</td> <td>Advance  </td> <td>140000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.4     </td> <td>55.5    </td> <td>Chjat      </td> <td>Advance  </td> <td>127100   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>37.6     </td> <td>55.8    </td> <td>Moscou     </td> <td>Advance  </td> <td>100000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.3     </td> <td>55.2    </td> <td>Wixma      </td> <td>Retreat  </td> <td>55000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.6    </td> <td>Smolensk   </td> <td>Retreat  </td> <td>24000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>30.4     </td> <td>54.4    </td> <td>Orscha     </td> <td>Retreat  </td> <td>20000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>26.8     </td> <td>54.3    </td> <td>Moiodexno  </td> <td>Retreat  </td> <td>12000    </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Longitude | Latitude | City        | Direction | Survivors\n",
       "32        | 54.8     | Smolensk    | Advance   | 145000\n",
       "33.2      | 54.9     | Dorogobouge | Advance   | 140000\n",
       "34.4      | 55.5     | Chjat       | Advance   | 127100\n",
       "37.6      | 55.8     | Moscou      | Advance   | 100000\n",
       "34.3      | 55.2     | Wixma       | Retreat   | 55000\n",
       "32        | 54.6     | Smolensk    | Retreat   | 24000\n",
       "30.4      | 54.4     | Orscha      | Retreat   | 20000\n",
       "26.8      | 54.3     | Moiodexno   | Retreat   | 12000"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard = Table.read_table(path_data + 'minard.csv')\n",
    "minard"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will use this small table to demonstrate some useful Table methods. We will then use those same methods, and develop other methods, on much larger tables of data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### The Size of the Table ###\n",
    "\n",
    "The method `num_columns` gives the number of columns in the table, and `num_rows` the number of rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.num_columns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "8"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.num_rows"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Column Labels \n",
    "The method `labels` can be used to list the labels of all the columns. With `minard` we don't gain much by this, but it can be very useful for tables that are so large that not all columns are visible on the screen."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "('Longitude', 'Latitude', 'City', 'Direction', 'Survivors')"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.labels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can change column labels using the `relabeled` method. This creates a new table and leaves `minard` unchanged."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Longitude</th> <th>Latitude</th> <th>City Name</th> <th>Direction</th> <th>Survivors</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.8    </td> <td>Smolensk   </td> <td>Advance  </td> <td>145000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>33.2     </td> <td>54.9    </td> <td>Dorogobouge</td> <td>Advance  </td> <td>140000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.4     </td> <td>55.5    </td> <td>Chjat      </td> <td>Advance  </td> <td>127100   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>37.6     </td> <td>55.8    </td> <td>Moscou     </td> <td>Advance  </td> <td>100000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.3     </td> <td>55.2    </td> <td>Wixma      </td> <td>Retreat  </td> <td>55000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.6    </td> <td>Smolensk   </td> <td>Retreat  </td> <td>24000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>30.4     </td> <td>54.4    </td> <td>Orscha     </td> <td>Retreat  </td> <td>20000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>26.8     </td> <td>54.3    </td> <td>Moiodexno  </td> <td>Retreat  </td> <td>12000    </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Longitude | Latitude | City Name   | Direction | Survivors\n",
       "32        | 54.8     | Smolensk    | Advance   | 145000\n",
       "33.2      | 54.9     | Dorogobouge | Advance   | 140000\n",
       "34.4      | 55.5     | Chjat       | Advance   | 127100\n",
       "37.6      | 55.8     | Moscou      | Advance   | 100000\n",
       "34.3      | 55.2     | Wixma       | Retreat   | 55000\n",
       "32        | 54.6     | Smolensk    | Retreat   | 24000\n",
       "30.4      | 54.4     | Orscha      | Retreat   | 20000\n",
       "26.8      | 54.3     | Moiodexno   | Retreat   | 12000"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.relabeled('City', 'City Name')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "However, this method does not change the original table. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Longitude</th> <th>Latitude</th> <th>City</th> <th>Direction</th> <th>Survivors</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.8    </td> <td>Smolensk   </td> <td>Advance  </td> <td>145000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>33.2     </td> <td>54.9    </td> <td>Dorogobouge</td> <td>Advance  </td> <td>140000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.4     </td> <td>55.5    </td> <td>Chjat      </td> <td>Advance  </td> <td>127100   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>37.6     </td> <td>55.8    </td> <td>Moscou     </td> <td>Advance  </td> <td>100000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.3     </td> <td>55.2    </td> <td>Wixma      </td> <td>Retreat  </td> <td>55000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.6    </td> <td>Smolensk   </td> <td>Retreat  </td> <td>24000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>30.4     </td> <td>54.4    </td> <td>Orscha     </td> <td>Retreat  </td> <td>20000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>26.8     </td> <td>54.3    </td> <td>Moiodexno  </td> <td>Retreat  </td> <td>12000    </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Longitude | Latitude | City        | Direction | Survivors\n",
       "32        | 54.8     | Smolensk    | Advance   | 145000\n",
       "33.2      | 54.9     | Dorogobouge | Advance   | 140000\n",
       "34.4      | 55.5     | Chjat       | Advance   | 127100\n",
       "37.6      | 55.8     | Moscou      | Advance   | 100000\n",
       "34.3      | 55.2     | Wixma       | Retreat   | 55000\n",
       "32        | 54.6     | Smolensk    | Retreat   | 24000\n",
       "30.4      | 54.4     | Orscha      | Retreat   | 20000\n",
       "26.8      | 54.3     | Moiodexno   | Retreat   | 12000"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A common pattern is to assign the original name `minard` to the new table, so that all future uses of `minard` will refer to the relabeled table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Longitude</th> <th>Latitude</th> <th>City Name</th> <th>Direction</th> <th>Survivors</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.8    </td> <td>Smolensk   </td> <td>Advance  </td> <td>145000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>33.2     </td> <td>54.9    </td> <td>Dorogobouge</td> <td>Advance  </td> <td>140000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.4     </td> <td>55.5    </td> <td>Chjat      </td> <td>Advance  </td> <td>127100   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>37.6     </td> <td>55.8    </td> <td>Moscou     </td> <td>Advance  </td> <td>100000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.3     </td> <td>55.2    </td> <td>Wixma      </td> <td>Retreat  </td> <td>55000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.6    </td> <td>Smolensk   </td> <td>Retreat  </td> <td>24000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>30.4     </td> <td>54.4    </td> <td>Orscha     </td> <td>Retreat  </td> <td>20000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>26.8     </td> <td>54.3    </td> <td>Moiodexno  </td> <td>Retreat  </td> <td>12000    </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Longitude | Latitude | City Name   | Direction | Survivors\n",
       "32        | 54.8     | Smolensk    | Advance   | 145000\n",
       "33.2      | 54.9     | Dorogobouge | Advance   | 140000\n",
       "34.4      | 55.5     | Chjat       | Advance   | 127100\n",
       "37.6      | 55.8     | Moscou      | Advance   | 100000\n",
       "34.3      | 55.2     | Wixma       | Retreat   | 55000\n",
       "32        | 54.6     | Smolensk    | Retreat   | 24000\n",
       "30.4      | 54.4     | Orscha      | Retreat   | 20000\n",
       "26.8      | 54.3     | Moiodexno   | Retreat   | 12000"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard = minard.relabeled('City', 'City Name')\n",
    "minard"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Accessing the Data in a Column ###\n",
    "We can use a column's label to access the array of data in the column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([145000, 140000, 127100, 100000,  55000,  24000,  20000,  12000])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.column('Survivors')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The 5 columns are indexed 0, 1, 2, 3, and 4. The column `Survivors` can also be accessed by using its column index."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([145000, 140000, 127100, 100000,  55000,  24000,  20000,  12000])"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.column(4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The 8 items in the array are indexed 0, 1, 2, and so on, up to 7. The items in the column can be accessed using `item`, as with any array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "145000"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.column(4).item(0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "24000"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.column(4).item(5)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Working with the Data in a Column ###\n",
    "Because columns are arrays, we can use array operations on them to discover new information. For example, we can create a new column that contains the percent of all survivors at each city after Smolensk."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Longitude</th> <th>Latitude</th> <th>City Name</th> <th>Direction</th> <th>Survivors</th> <th>Percent Surviving</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.8    </td> <td>Smolensk   </td> <td>Advance  </td> <td>145000   </td> <td>1                </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>33.2     </td> <td>54.9    </td> <td>Dorogobouge</td> <td>Advance  </td> <td>140000   </td> <td>0.965517         </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.4     </td> <td>55.5    </td> <td>Chjat      </td> <td>Advance  </td> <td>127100   </td> <td>0.876552         </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>37.6     </td> <td>55.8    </td> <td>Moscou     </td> <td>Advance  </td> <td>100000   </td> <td>0.689655         </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.3     </td> <td>55.2    </td> <td>Wixma      </td> <td>Retreat  </td> <td>55000    </td> <td>0.37931          </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.6    </td> <td>Smolensk   </td> <td>Retreat  </td> <td>24000    </td> <td>0.165517         </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>30.4     </td> <td>54.4    </td> <td>Orscha     </td> <td>Retreat  </td> <td>20000    </td> <td>0.137931         </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>26.8     </td> <td>54.3    </td> <td>Moiodexno  </td> <td>Retreat  </td> <td>12000    </td> <td>0.0827586        </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Longitude | Latitude | City Name   | Direction | Survivors | Percent Surviving\n",
       "32        | 54.8     | Smolensk    | Advance   | 145000    | 1\n",
       "33.2      | 54.9     | Dorogobouge | Advance   | 140000    | 0.965517\n",
       "34.4      | 55.5     | Chjat       | Advance   | 127100    | 0.876552\n",
       "37.6      | 55.8     | Moscou      | Advance   | 100000    | 0.689655\n",
       "34.3      | 55.2     | Wixma       | Retreat   | 55000     | 0.37931\n",
       "32        | 54.6     | Smolensk    | Retreat   | 24000     | 0.165517\n",
       "30.4      | 54.4     | Orscha      | Retreat   | 20000     | 0.137931\n",
       "26.8      | 54.3     | Moiodexno   | Retreat   | 12000     | 0.0827586"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "initial = minard.column('Survivors').item(0)\n",
    "minard = minard.with_columns(\n",
    "    'Percent Surviving', minard.column('Survivors')/initial\n",
    ")\n",
    "minard"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To make the proportions in the new columns appear as percents, we can use the method `set_format` with the option `PercentFormatter`. The `set_format` method takes `Formatter` objects, which exist for dates (`DateFormatter`), currencies (`CurrencyFormatter`), numbers, and percentages."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Longitude</th> <th>Latitude</th> <th>City Name</th> <th>Direction</th> <th>Survivors</th> <th>Percent Surviving</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.8    </td> <td>Smolensk   </td> <td>Advance  </td> <td>145000   </td> <td>100.00%          </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>33.2     </td> <td>54.9    </td> <td>Dorogobouge</td> <td>Advance  </td> <td>140000   </td> <td>96.55%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.4     </td> <td>55.5    </td> <td>Chjat      </td> <td>Advance  </td> <td>127100   </td> <td>87.66%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>37.6     </td> <td>55.8    </td> <td>Moscou     </td> <td>Advance  </td> <td>100000   </td> <td>68.97%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.3     </td> <td>55.2    </td> <td>Wixma      </td> <td>Retreat  </td> <td>55000    </td> <td>37.93%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.6    </td> <td>Smolensk   </td> <td>Retreat  </td> <td>24000    </td> <td>16.55%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>30.4     </td> <td>54.4    </td> <td>Orscha     </td> <td>Retreat  </td> <td>20000    </td> <td>13.79%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>26.8     </td> <td>54.3    </td> <td>Moiodexno  </td> <td>Retreat  </td> <td>12000    </td> <td>8.28%            </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Longitude | Latitude | City Name   | Direction | Survivors | Percent Surviving\n",
       "32        | 54.8     | Smolensk    | Advance   | 145000    | 100.00%\n",
       "33.2      | 54.9     | Dorogobouge | Advance   | 140000    | 96.55%\n",
       "34.4      | 55.5     | Chjat       | Advance   | 127100    | 87.66%\n",
       "37.6      | 55.8     | Moscou      | Advance   | 100000    | 68.97%\n",
       "34.3      | 55.2     | Wixma       | Retreat   | 55000     | 37.93%\n",
       "32        | 54.6     | Smolensk    | Retreat   | 24000     | 16.55%\n",
       "30.4      | 54.4     | Orscha      | Retreat   | 20000     | 13.79%\n",
       "26.8      | 54.3     | Moiodexno   | Retreat   | 12000     | 8.28%"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.set_format('Percent Surviving', PercentFormatter)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Choosing Sets of Columns ###\n",
    "The method `select` creates a new table that contains only the specified columns."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Longitude</th> <th>Latitude</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.8    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>33.2     </td> <td>54.9    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.4     </td> <td>55.5    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>37.6     </td> <td>55.8    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.3     </td> <td>55.2    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.6    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>30.4     </td> <td>54.4    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>26.8     </td> <td>54.3    </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Longitude | Latitude\n",
       "32        | 54.8\n",
       "33.2      | 54.9\n",
       "34.4      | 55.5\n",
       "37.6      | 55.8\n",
       "34.3      | 55.2\n",
       "32        | 54.6\n",
       "30.4      | 54.4\n",
       "26.8      | 54.3"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.select('Longitude', 'Latitude')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The same selection can be made using column indices instead of labels."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Longitude</th> <th>Latitude</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.8    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>33.2     </td> <td>54.9    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.4     </td> <td>55.5    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>37.6     </td> <td>55.8    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.3     </td> <td>55.2    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.6    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>30.4     </td> <td>54.4    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>26.8     </td> <td>54.3    </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Longitude | Latitude\n",
       "32        | 54.8\n",
       "33.2      | 54.9\n",
       "34.4      | 55.5\n",
       "37.6      | 55.8\n",
       "34.3      | 55.2\n",
       "32        | 54.6\n",
       "30.4      | 54.4\n",
       "26.8      | 54.3"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.select(0, 1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result of using `select` is a new table, even when you select just one column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Survivors</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>145000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>140000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>127100   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>100000   </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>55000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>24000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>20000    </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>12000    </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Survivors\n",
       "145000\n",
       "140000\n",
       "127100\n",
       "100000\n",
       "55000\n",
       "24000\n",
       "20000\n",
       "12000"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.select('Survivors')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that the result is a table, unlike the result of `column`, which is an array."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([145000, 140000, 127100, 100000,  55000,  24000,  20000,  12000])"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.column('Survivors')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another way to create a new table consisting of a set of columns is to `drop` the columns you don't want."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>City Name</th> <th>Survivors</th> <th>Percent Surviving</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>Smolensk   </td> <td>145000   </td> <td>100.00%          </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>Dorogobouge</td> <td>140000   </td> <td>96.55%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>Chjat      </td> <td>127100   </td> <td>87.66%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>Moscou     </td> <td>100000   </td> <td>68.97%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>Wixma      </td> <td>55000    </td> <td>37.93%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>Smolensk   </td> <td>24000    </td> <td>16.55%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>Orscha     </td> <td>20000    </td> <td>13.79%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>Moiodexno  </td> <td>12000    </td> <td>8.28%            </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "City Name   | Survivors | Percent Surviving\n",
       "Smolensk    | 145000    | 100.00%\n",
       "Dorogobouge | 140000    | 96.55%\n",
       "Chjat       | 127100    | 87.66%\n",
       "Moscou      | 100000    | 68.97%\n",
       "Wixma       | 55000     | 37.93%\n",
       "Smolensk    | 24000     | 16.55%\n",
       "Orscha      | 20000     | 13.79%\n",
       "Moiodexno   | 12000     | 8.28%"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard.drop('Longitude', 'Latitude', 'Direction')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Neither `select` nor `drop` change the original table. Instead, they create new smaller tables that share the same data. The fact that the original table is preserved is useful! You can generate multiple different tables that only consider certain columns without worrying that one analysis will affect the other."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table border=\"1\" class=\"dataframe\">\n",
       "    <thead>\n",
       "        <tr>\n",
       "            <th>Longitude</th> <th>Latitude</th> <th>City Name</th> <th>Direction</th> <th>Survivors</th> <th>Percent Surviving</th>\n",
       "        </tr>\n",
       "    </thead>\n",
       "    <tbody>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.8    </td> <td>Smolensk   </td> <td>Advance  </td> <td>145000   </td> <td>100.00%          </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>33.2     </td> <td>54.9    </td> <td>Dorogobouge</td> <td>Advance  </td> <td>140000   </td> <td>96.55%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.4     </td> <td>55.5    </td> <td>Chjat      </td> <td>Advance  </td> <td>127100   </td> <td>87.66%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>37.6     </td> <td>55.8    </td> <td>Moscou     </td> <td>Advance  </td> <td>100000   </td> <td>68.97%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>34.3     </td> <td>55.2    </td> <td>Wixma      </td> <td>Retreat  </td> <td>55000    </td> <td>37.93%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>32       </td> <td>54.6    </td> <td>Smolensk   </td> <td>Retreat  </td> <td>24000    </td> <td>16.55%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>30.4     </td> <td>54.4    </td> <td>Orscha     </td> <td>Retreat  </td> <td>20000    </td> <td>13.79%           </td>\n",
       "        </tr>\n",
       "        <tr>\n",
       "            <td>26.8     </td> <td>54.3    </td> <td>Moiodexno  </td> <td>Retreat  </td> <td>12000    </td> <td>8.28%            </td>\n",
       "        </tr>\n",
       "    </tbody>\n",
       "</table>"
      ],
      "text/plain": [
       "Longitude | Latitude | City Name   | Direction | Survivors | Percent Surviving\n",
       "32        | 54.8     | Smolensk    | Advance   | 145000    | 100.00%\n",
       "33.2      | 54.9     | Dorogobouge | Advance   | 140000    | 96.55%\n",
       "34.4      | 55.5     | Chjat       | Advance   | 127100    | 87.66%\n",
       "37.6      | 55.8     | Moscou      | Advance   | 100000    | 68.97%\n",
       "34.3      | 55.2     | Wixma       | Retreat   | 55000     | 37.93%\n",
       "32        | 54.6     | Smolensk    | Retreat   | 24000     | 16.55%\n",
       "30.4      | 54.4     | Orscha      | Retreat   | 20000     | 13.79%\n",
       "26.8      | 54.3     | Moiodexno   | Retreat   | 12000     | 8.28%"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "minard"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "All of the methods that we have used above can be applied to any table."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}