{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove_input" ] }, "outputs": [], "source": [ "path_data = '../../data/'\n", "\n", "import pandas as pd\n", "import numpy as np\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('fivethirtyeight')\n", "\n", "cones = pd.read_csv(path_data + 'cones.csv')\n", "nba = pd.read_csv(path_data + 'nba_salaries.csv')\n", "nba.columns=['PLAYER','POSITION','TEAM','SALARY']\n", "movies = pd.read_csv(path_data + 'movies_by_year.csv')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction to DataFrames\n", "\n", "We can now apply Python to analyze data. We will work with data stored in DataFrame structures.\n", "\n", "A DataFrames (df) is a fundamental way of representing data sets. A df can be viewed in two ways:\n", "* a sequence of named columns that each describe a single attribute of all entries in a data set, or\n", "* a sequence of rows that each contain all information about a single individual in a data set.\n", "\n", "We will study dfs in great detail in the next several chapters. For now, we will just introduce a few methods without going into technical details. \n", "\n", "The df `cones` has been imported for us; later we will see how, but here we will just work with it. First, let's take a look at it." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FlavorColorPrice
0strawberrypink3.55
1chocolatelight brown4.75
2chocolatedark brown5.25
3strawberrypink5.25
4chocolatedark brown5.25
\n", "
" ], "text/plain": [ " Flavor Color Price\n", "0 strawberry pink 3.55\n", "1 chocolate light brown 4.75\n", "2 chocolate dark brown 5.25\n", "3 strawberry pink 5.25\n", "4 chocolate dark brown 5.25" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cones.head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The DataFrame has six rows. Each row corresponds to one ice cream cone. The ice cream cones are the *individuals*.\n", "\n", "Each cone has three attributes: flavor, color, and price. Each column contains the data on one of these attributes, and so all the entries of any single column are of the same kind. Each column has a label. We will refer to columns by their labels.\n", "\n", "A df method is just like a function, but it must operate on a df. So the call looks like\n", "\n", "`name_of_DataFrame.method(arguments)`\n", "\n", "For example, if you want to see just the first two rows of a df, you can use the df method `head`." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FlavorColorPrice
0strawberrypink3.55
1chocolatelight brown4.75
\n", "
" ], "text/plain": [ " Flavor Color Price\n", "0 strawberry pink 3.55\n", "1 chocolate light brown 4.75" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cones.head(2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can replace 2 by any number of rows. If you ask for more than six, you will only get six, because `cones` only has six rows." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Choosing Sets of Columns\n", "The method `select` creates a new table consisting of only the specified columns.\n", "We can state which columns we want to view by using dot '.' notation (not he same as in maths) or hard brackets with quotes. Note that an index is automatically generated, this is a fundamental aspect of the DataFrame as the index allows us to 'locate' members of the DataFrame." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 strawberry\n", "1 chocolate\n", "2 chocolate\n", "3 strawberry\n", "4 chocolate\n", "5 bubblegum\n", "Name: Flavor, dtype: object" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# single square brackets\n", "\n", "cones['Flavor']\n", "\n", "# uncomment (remove the hash mark) the line below to view the 'type()' of the output\n", "\n", "#type(cones['Flavor'])" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Flavor
0strawberry
1chocolate
2chocolate
3strawberry
4chocolate
5bubblegum
\n", "
" ], "text/plain": [ " Flavor\n", "0 strawberry\n", "1 chocolate\n", "2 chocolate\n", "3 strawberry\n", "4 chocolate\n", "5 bubblegum" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# double square brackets\n", "\n", "cones[['Flavor']]\n", "\n", "# uncomment the line below to view the 'type()' of the output\n", "\n", "# type(cones[['Flavor']])" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 strawberry\n", "1 chocolate\n", "2 chocolate\n", "3 strawberry\n", "4 chocolate\n", "5 bubblegum\n", "Name: Flavor, dtype: object" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cones.Flavor" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This leaves the original table unchanged." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FlavorColorPrice
0strawberrypink3.55
1chocolatelight brown4.75
2chocolatedark brown5.25
3strawberrypink5.25
4chocolatedark brown5.25
5bubblegumpink4.75
\n", "
" ], "text/plain": [ " Flavor Color Price\n", "0 strawberry pink 3.55\n", "1 chocolate light brown 4.75\n", "2 chocolate dark brown 5.25\n", "3 strawberry pink 5.25\n", "4 chocolate dark brown 5.25\n", "5 bubblegum pink 4.75" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cones" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can select more than one column, by separating the column labels by commas. When you wish to view more than one column the 'hard brackets' must be used twice." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FlavorPrice
0strawberry3.55
1chocolate4.75
2chocolate5.25
3strawberry5.25
4chocolate5.25
5bubblegum4.75
\n", "
" ], "text/plain": [ " Flavor Price\n", "0 strawberry 3.55\n", "1 chocolate 4.75\n", "2 chocolate 5.25\n", "3 strawberry 5.25\n", "4 chocolate 5.25\n", "5 bubblegum 4.75" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cones[['Flavor', 'Price']]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can also *drop* columns you don't want. The table above can be created by dropping the `Color` column." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FlavorPrice
0strawberry3.55
1chocolate4.75
2chocolate5.25
3strawberry5.25
4chocolate5.25
5bubblegum4.75
\n", "
" ], "text/plain": [ " Flavor Price\n", "0 strawberry 3.55\n", "1 chocolate 4.75\n", "2 chocolate 5.25\n", "3 strawberry 5.25\n", "4 chocolate 5.25\n", "5 bubblegum 4.75" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cones.drop(columns=['Color'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can name this new table and look at it again by just typing its name." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FlavorPrice
0strawberry3.55
1chocolate4.75
2chocolate5.25
3strawberry5.25
4chocolate5.25
5bubblegum4.75
\n", "
" ], "text/plain": [ " Flavor Price\n", "0 strawberry 3.55\n", "1 chocolate 4.75\n", "2 chocolate 5.25\n", "3 strawberry 5.25\n", "4 chocolate 5.25\n", "5 bubblegum 4.75" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "no_colors = cones.drop(columns=['Color'])\n", "\n", "no_colors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like selecting columns using hard brackets or dot notation, the `drop` method creates a smaller table and leaves the original table unchanged. In order to explore your data, you can create any number of smaller tables by using choosing or dropping columns. It will do no harm to your original data table." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sorting Rows" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `sort_values` method creates a new table by arranging the rows of the original table in ascending order of the values in the specified column. Here the `cones` table has been sorted in ascending order of the price of the cones.\n", "\n", "[pandas sort_values](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html#pandas-dataframe-sort-values)" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FlavorColorPrice
0strawberrypink3.55
1chocolatelight brown4.75
5bubblegumpink4.75
2chocolatedark brown5.25
3strawberrypink5.25
4chocolatedark brown5.25
\n", "
" ], "text/plain": [ " Flavor Color Price\n", "0 strawberry pink 3.55\n", "1 chocolate light brown 4.75\n", "5 bubblegum pink 4.75\n", "2 chocolate dark brown 5.25\n", "3 strawberry pink 5.25\n", "4 chocolate dark brown 5.25" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cones.sort_values('Price')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To sort in descending order, you can use an *optional* argument to `sort`. As the name implies, optional arguments don't have to be used, but they can be used if you want to change the default behavior of a method. \n", "\n", "By default, `sort` sorts in increasing order of the values in the specified column. To sort in decreasing order, use the optional argument `ascending=False`, the default value for `ascending` is `True`." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FlavorColorPrice
2chocolatedark brown5.25
3strawberrypink5.25
4chocolatedark brown5.25
1chocolatelight brown4.75
5bubblegumpink4.75
0strawberrypink3.55
\n", "
" ], "text/plain": [ " Flavor Color Price\n", "2 chocolate dark brown 5.25\n", "3 strawberry pink 5.25\n", "4 chocolate dark brown 5.25\n", "1 chocolate light brown 4.75\n", "5 bubblegum pink 4.75\n", "0 strawberry pink 3.55" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cones.sort_values('Price', ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As when selecting and `drop`ing the `sort` method leaves the original table unchanged." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Selecting Rows that Satisfy a Condition\n", "Creating a new DataFrame (in database world this wold be a 'view'), consisting only of the rows that satisfy a given condition we use the 'exactly equal to' `==`. In this section we will work with a very simple condition, which is that the value in a specified column must be exactly equal to a value that we also specify. Thus the `==` method has two arguments.\n", "\n", "The code in the cell below creates a df consisting only of the rows corresponding to chocolate cones." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FlavorColorPrice
1chocolatelight brown4.75
2chocolatedark brown5.25
4chocolatedark brown5.25
\n", "
" ], "text/plain": [ " Flavor Color Price\n", "1 chocolate light brown 4.75\n", "2 chocolate dark brown 5.25\n", "4 chocolate dark brown 5.25" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cones[cones['Flavor']=='chocolate']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The arguments are the label of the column and the value we are looking for in that column. The `==` method can also be used when the condition that the rows must satisfy is more complicated. In those situations the call will be a little more complicated as well.\n", "\n", "It is important to provide the value exactly. For example, if we specify `Chocolate` instead of `chocolate`, then `where` correctly finds no rows where the flavor is `Chocolate`." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
FlavorColorPrice
\n", "
" ], "text/plain": [ "Empty DataFrame\n", "Columns: [Flavor, Color, Price]\n", "Index: []" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "cones[cones['Flavor'] == 'Chocolate']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like all the other table methods in this section, `==` leaves the original table unchanged." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: Salaries in the NBA" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"The NBA is the highest paying professional sports league in the world,\" [reported CNN](http://edition.cnn.com/2015/12/04/sport/gallery/highest-paid-nba-players/) in March 2016. The table `nba` contains the [salaries of all National Basketball Association players](https://www.statcrunch.com/app/index.php?dataid=1843341) in 2015-2016.\n", "\n", "Each row represents one player. The columns are:\n", "\n", "| **Column Label** | Description |\n", "|--------------------|-----------------------------------------------------|\n", "| `PLAYER` | Player's name |\n", "| `POSITION` | Player's position on team |\n", "| `TEAM` | Team name |\n", "|`SALARY` | Player's salary in 2015-2016, in millions of dollars|\n", " \n", "The code for the positions is PG (Point Guard), SG (Shooting Guard), PF (Power Forward), SF (Small Forward), and C (Center). But what follows doesn't involve details about how basketball is played.\n", "\n", "The first row shows that Paul Millsap, Power Forward for the Atlanta Hawks, had a salary of almost $\\$18.7$ million in 2015-2016." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYERPOSITIONTEAMSALARY
0Paul MillsapPFAtlanta Hawks18.671659
1Al HorfordCAtlanta Hawks12.000000
2Tiago SplitterCAtlanta Hawks9.756250
3Jeff TeaguePGAtlanta Hawks8.000000
4Kyle KorverSGAtlanta Hawks5.746479
...............
412Gary NealPGWashington Wizards2.139000
413DeJuan BlairCWashington Wizards2.000000
414Kelly Oubre Jr.SFWashington Wizards1.920240
415Garrett TempleSGWashington Wizards1.100602
416Jarell EddieSGWashington Wizards0.561716
\n", "

417 rows × 4 columns

\n", "
" ], "text/plain": [ " PLAYER POSITION TEAM SALARY\n", "0 Paul Millsap PF Atlanta Hawks 18.671659\n", "1 Al Horford C Atlanta Hawks 12.000000\n", "2 Tiago Splitter C Atlanta Hawks 9.756250\n", "3 Jeff Teague PG Atlanta Hawks 8.000000\n", "4 Kyle Korver SG Atlanta Hawks 5.746479\n", ".. ... ... ... ...\n", "412 Gary Neal PG Washington Wizards 2.139000\n", "413 DeJuan Blair C Washington Wizards 2.000000\n", "414 Kelly Oubre Jr. SF Washington Wizards 1.920240\n", "415 Garrett Temple SG Washington Wizards 1.100602\n", "416 Jarell Eddie SG Washington Wizards 0.561716\n", "\n", "[417 rows x 4 columns]" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nba" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Fans of Stephen Curry can find his row by using `where`." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYERPOSITIONTEAMSALARY
121Stephen CurryPGGolden State Warriors11.370786
\n", "
" ], "text/plain": [ " PLAYER POSITION TEAM SALARY\n", "121 Stephen Curry PG Golden State Warriors 11.370786" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nba[nba['PLAYER'] == 'Stephen Curry']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also create a new table called `warriors` consisting of just the data for the Golden State Warriors." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYERPOSITIONTEAMSALARY
117Klay ThompsonSGGolden State Warriors15.501000
118Draymond GreenPFGolden State Warriors14.260870
119Andrew BogutCGolden State Warriors13.800000
120Andre IguodalaSFGolden State Warriors11.710456
121Stephen CurryPGGolden State Warriors11.370786
122Jason ThompsonPFGolden State Warriors7.008475
123Shaun LivingstonPGGolden State Warriors5.543725
124Harrison BarnesSFGolden State Warriors3.873398
125Marreese SpeightsCGolden State Warriors3.815000
126Leandro BarbosaSGGolden State Warriors2.500000
127Festus EzeliCGolden State Warriors2.008748
128Brandon RushSFGolden State Warriors1.270964
129Kevon LooneySFGolden State Warriors1.131960
130Anderson VarejaoPFGolden State Warriors0.289755
\n", "
" ], "text/plain": [ " PLAYER POSITION TEAM SALARY\n", "117 Klay Thompson SG Golden State Warriors 15.501000\n", "118 Draymond Green PF Golden State Warriors 14.260870\n", "119 Andrew Bogut C Golden State Warriors 13.800000\n", "120 Andre Iguodala SF Golden State Warriors 11.710456\n", "121 Stephen Curry PG Golden State Warriors 11.370786\n", "122 Jason Thompson PF Golden State Warriors 7.008475\n", "123 Shaun Livingston PG Golden State Warriors 5.543725\n", "124 Harrison Barnes SF Golden State Warriors 3.873398\n", "125 Marreese Speights C Golden State Warriors 3.815000\n", "126 Leandro Barbosa SG Golden State Warriors 2.500000\n", "127 Festus Ezeli C Golden State Warriors 2.008748\n", "128 Brandon Rush SF Golden State Warriors 1.270964\n", "129 Kevon Looney SF Golden State Warriors 1.131960\n", "130 Anderson Varejao PF Golden State Warriors 0.289755" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "warriors = nba[nba['TEAM'] =='Golden State Warriors']\n", "warriors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, the first 10 lines of a table are displayed. You can use `head()` to display more or fewer. To display the entire table type the name of the DataFrame." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYERPOSITIONTEAMSALARY
117Klay ThompsonSGGolden State Warriors15.501000
118Draymond GreenPFGolden State Warriors14.260870
119Andrew BogutCGolden State Warriors13.800000
120Andre IguodalaSFGolden State Warriors11.710456
121Stephen CurryPGGolden State Warriors11.370786
122Jason ThompsonPFGolden State Warriors7.008475
123Shaun LivingstonPGGolden State Warriors5.543725
124Harrison BarnesSFGolden State Warriors3.873398
125Marreese SpeightsCGolden State Warriors3.815000
126Leandro BarbosaSGGolden State Warriors2.500000
127Festus EzeliCGolden State Warriors2.008748
128Brandon RushSFGolden State Warriors1.270964
129Kevon LooneySFGolden State Warriors1.131960
130Anderson VarejaoPFGolden State Warriors0.289755
\n", "
" ], "text/plain": [ " PLAYER POSITION TEAM SALARY\n", "117 Klay Thompson SG Golden State Warriors 15.501000\n", "118 Draymond Green PF Golden State Warriors 14.260870\n", "119 Andrew Bogut C Golden State Warriors 13.800000\n", "120 Andre Iguodala SF Golden State Warriors 11.710456\n", "121 Stephen Curry PG Golden State Warriors 11.370786\n", "122 Jason Thompson PF Golden State Warriors 7.008475\n", "123 Shaun Livingston PG Golden State Warriors 5.543725\n", "124 Harrison Barnes SF Golden State Warriors 3.873398\n", "125 Marreese Speights C Golden State Warriors 3.815000\n", "126 Leandro Barbosa SG Golden State Warriors 2.500000\n", "127 Festus Ezeli C Golden State Warriors 2.008748\n", "128 Brandon Rush SF Golden State Warriors 1.270964\n", "129 Kevon Looney SF Golden State Warriors 1.131960\n", "130 Anderson Varejao PF Golden State Warriors 0.289755" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "warriors" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `nba` table is sorted in alphabetical order of the team names. To see how the players were paid in 2015-2016, it is useful to sort the data by salary. Remember that by default, the sorting is in increasing order." ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYERPOSITIONTEAMSALARY
267Thanasis AntetokounmpoSFNew York Knicks0.030888
327Cory JeffersonPFPhoenix Suns0.049709
326Jordan McRaeSGPhoenix Suns0.049709
324Orlando JohnsonSGPhoenix Suns0.055722
325Phil PresseyPGPhoenix Suns0.055722
...............
131Dwight HowardCHouston Rockets22.359364
255Carmelo AnthonySFNew York Knicks22.875000
72LeBron JamesSFCleveland Cavaliers22.970500
29Joe JohnsonSFBrooklyn Nets24.894863
169Kobe BryantSFLos Angeles Lakers25.000000
\n", "

417 rows × 4 columns

\n", "
" ], "text/plain": [ " PLAYER POSITION TEAM SALARY\n", "267 Thanasis Antetokounmpo SF New York Knicks 0.030888\n", "327 Cory Jefferson PF Phoenix Suns 0.049709\n", "326 Jordan McRae SG Phoenix Suns 0.049709\n", "324 Orlando Johnson SG Phoenix Suns 0.055722\n", "325 Phil Pressey PG Phoenix Suns 0.055722\n", ".. ... ... ... ...\n", "131 Dwight Howard C Houston Rockets 22.359364\n", "255 Carmelo Anthony SF New York Knicks 22.875000\n", "72 LeBron James SF Cleveland Cavaliers 22.970500\n", "29 Joe Johnson SF Brooklyn Nets 24.894863\n", "169 Kobe Bryant SF Los Angeles Lakers 25.000000\n", "\n", "[417 rows x 4 columns]" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nba.sort_values('SALARY')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These figures are somewhat difficult to compare as some of these players changed teams during the season and received salaries from more than one team; only the salary from the last team appears in the table. \n", "\n", "The CNN report is about the other end of the salary scale – the players who are among the highest paid in the world. To identify these players we can sort in descending order of salary and look at the top few rows." ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYERPOSITIONTEAMSALARY
169Kobe BryantSFLos Angeles Lakers25.000000
29Joe JohnsonSFBrooklyn Nets24.894863
72LeBron JamesSFCleveland Cavaliers22.970500
255Carmelo AnthonySFNew York Knicks22.875000
131Dwight HowardCHouston Rockets22.359364
...............
200Elliot WilliamsSGMemphis Grizzlies0.055722
324Orlando JohnsonSGPhoenix Suns0.055722
327Cory JeffersonPFPhoenix Suns0.049709
326Jordan McRaeSGPhoenix Suns0.049709
267Thanasis AntetokounmpoSFNew York Knicks0.030888
\n", "

417 rows × 4 columns

\n", "
" ], "text/plain": [ " PLAYER POSITION TEAM SALARY\n", "169 Kobe Bryant SF Los Angeles Lakers 25.000000\n", "29 Joe Johnson SF Brooklyn Nets 24.894863\n", "72 LeBron James SF Cleveland Cavaliers 22.970500\n", "255 Carmelo Anthony SF New York Knicks 22.875000\n", "131 Dwight Howard C Houston Rockets 22.359364\n", ".. ... ... ... ...\n", "200 Elliot Williams SG Memphis Grizzlies 0.055722\n", "324 Orlando Johnson SG Phoenix Suns 0.055722\n", "327 Cory Jefferson PF Phoenix Suns 0.049709\n", "326 Jordan McRae SG Phoenix Suns 0.049709\n", "267 Thanasis Antetokounmpo SF New York Knicks 0.030888\n", "\n", "[417 rows x 4 columns]" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nba.sort_values('SALARY', ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kobe Bryant, since retired, was the highest earning NBA player in 2015-2016." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 2 }