{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"tags": [
"remove_input"
]
},
"outputs": [],
"source": [
"path_data = '../../data/'\n",
"\n",
"import pandas as pd\n",
"import numpy as np"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Sorting Rows\n",
"\n",
"\"The NBA is the highest paying professional sports league in the world,\" [reported CNN](http://edition.cnn.com/2015/12/04/sport/gallery/highest-paid-nba-players/) in March 2016. The table `nba_salaries` contains the salaries of all National Basketball Association players in 2015-2016.\n",
"\n",
"Each row represents one player. The columns are:\n",
"\n",
"| **Column Label** | Description |\n",
"|--------------------|-----------------------------------------------------|\n",
"| `PLAYER` | Player's name |\n",
"| `POSITION` | Player's position on team |\n",
"| `TEAM` | Team name |\n",
"|`'15-'16 SALARY` | Player's salary in 2015-2016, in millions of dollars|\n",
" \n",
"The code for the positions is PG (Point Guard), SG (Shooting Guard), PF (Power Forward), SF (Small Forward), and C (Center). But what follows doesn't involve details about how basketball is played.\n",
"\n",
"The first row shows that Paul Millsap, Power Forward for the Atlanta Hawks, had a salary of almost $\\$18.7$ million in 2015-2016."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PLAYER | \n",
" POSITION | \n",
" TEAM | \n",
" '15-'16 SALARY | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Paul Millsap | \n",
" PF | \n",
" Atlanta Hawks | \n",
" 18.671659 | \n",
"
\n",
" \n",
" 1 | \n",
" Al Horford | \n",
" C | \n",
" Atlanta Hawks | \n",
" 12.000000 | \n",
"
\n",
" \n",
" 2 | \n",
" Tiago Splitter | \n",
" C | \n",
" Atlanta Hawks | \n",
" 9.756250 | \n",
"
\n",
" \n",
" 3 | \n",
" Jeff Teague | \n",
" PG | \n",
" Atlanta Hawks | \n",
" 8.000000 | \n",
"
\n",
" \n",
" 4 | \n",
" Kyle Korver | \n",
" SG | \n",
" Atlanta Hawks | \n",
" 5.746479 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 412 | \n",
" Gary Neal | \n",
" PG | \n",
" Washington Wizards | \n",
" 2.139000 | \n",
"
\n",
" \n",
" 413 | \n",
" DeJuan Blair | \n",
" C | \n",
" Washington Wizards | \n",
" 2.000000 | \n",
"
\n",
" \n",
" 414 | \n",
" Kelly Oubre Jr. | \n",
" SF | \n",
" Washington Wizards | \n",
" 1.920240 | \n",
"
\n",
" \n",
" 415 | \n",
" Garrett Temple | \n",
" SG | \n",
" Washington Wizards | \n",
" 1.100602 | \n",
"
\n",
" \n",
" 416 | \n",
" Jarell Eddie | \n",
" SG | \n",
" Washington Wizards | \n",
" 0.561716 | \n",
"
\n",
" \n",
"
\n",
"
417 rows × 4 columns
\n",
"
"
],
"text/plain": [
" PLAYER POSITION TEAM '15-'16 SALARY\n",
"0 Paul Millsap PF Atlanta Hawks 18.671659\n",
"1 Al Horford C Atlanta Hawks 12.000000\n",
"2 Tiago Splitter C Atlanta Hawks 9.756250\n",
"3 Jeff Teague PG Atlanta Hawks 8.000000\n",
"4 Kyle Korver SG Atlanta Hawks 5.746479\n",
".. ... ... ... ...\n",
"412 Gary Neal PG Washington Wizards 2.139000\n",
"413 DeJuan Blair C Washington Wizards 2.000000\n",
"414 Kelly Oubre Jr. SF Washington Wizards 1.920240\n",
"415 Garrett Temple SG Washington Wizards 1.100602\n",
"416 Jarell Eddie SG Washington Wizards 0.561716\n",
"\n",
"[417 rows x 4 columns]"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# This table can be found online: https://www.statcrunch.com/app/index.php?dataid=1843341\n",
"nba_salaries = pd.read_csv(path_data + 'nba_salaries.csv')\n",
"nba_salaries"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The table contains 417 rows, one for each player. Only 10 of the rows are displayed. The `show` method allows us to specify the number of rows, with the default (no specification) being all the rows of the table."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PLAYER | \n",
" POSITION | \n",
" TEAM | \n",
" '15-'16 SALARY | \n",
"
\n",
" \n",
" \n",
" \n",
" 0 | \n",
" Paul Millsap | \n",
" PF | \n",
" Atlanta Hawks | \n",
" 18.671659 | \n",
"
\n",
" \n",
" 1 | \n",
" Al Horford | \n",
" C | \n",
" Atlanta Hawks | \n",
" 12.000000 | \n",
"
\n",
" \n",
" 2 | \n",
" Tiago Splitter | \n",
" C | \n",
" Atlanta Hawks | \n",
" 9.756250 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PLAYER POSITION TEAM '15-'16 SALARY\n",
"0 Paul Millsap PF Atlanta Hawks 18.671659\n",
"1 Al Horford C Atlanta Hawks 12.000000\n",
"2 Tiago Splitter C Atlanta Hawks 9.756250"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nba_salaries.head(3)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Glance through about 20 rows or so, and you will see that the rows are in alphabetical order by team name. It's also possible to list the same rows in alphabetical order by player name using the `sort` method. The argument to `sort` is a column label or index."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PLAYER | \n",
" POSITION | \n",
" TEAM | \n",
" '15-'16 SALARY | \n",
"
\n",
" \n",
" \n",
" \n",
" 68 | \n",
" Aaron Brooks | \n",
" PG | \n",
" Chicago Bulls | \n",
" 2.250000 | \n",
"
\n",
" \n",
" 291 | \n",
" Aaron Gordon | \n",
" PF | \n",
" Orlando Magic | \n",
" 4.171680 | \n",
"
\n",
" \n",
" 59 | \n",
" Aaron Harrison | \n",
" SG | \n",
" Charlotte Hornets | \n",
" 0.525093 | \n",
"
\n",
" \n",
" 235 | \n",
" Adreian Payne | \n",
" PF | \n",
" Minnesota Timberwolves | \n",
" 1.938840 | \n",
"
\n",
" \n",
" 1 | \n",
" Al Horford | \n",
" C | \n",
" Atlanta Hawks | \n",
" 12.000000 | \n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" PLAYER POSITION TEAM '15-'16 SALARY\n",
"68 Aaron Brooks PG Chicago Bulls 2.250000\n",
"291 Aaron Gordon PF Orlando Magic 4.171680\n",
"59 Aaron Harrison SG Charlotte Hornets 0.525093\n",
"235 Adreian Payne PF Minnesota Timberwolves 1.938840\n",
"1 Al Horford C Atlanta Hawks 12.000000"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nba_salaries.sort_values('PLAYER').head(5)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To examine the players' salaries, it would be much more helpful if the data were ordered by salary.\n",
"\n",
"To do this, we will first simplify the label of the column of salaries (just for convenience), and then sort by the new label `SALARY`. \n",
"\n",
"This arranges all the rows of the table in *increasing* order of salary, with the lowest salary appearing first. The output is a new table with the same columns as the original but with the rows rearranged."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PLAYER | \n",
" POSITION | \n",
" TEAM | \n",
" SALARY | \n",
"
\n",
" \n",
" \n",
" \n",
" 267 | \n",
" Thanasis Antetokounmpo | \n",
" SF | \n",
" New York Knicks | \n",
" 0.030888 | \n",
"
\n",
" \n",
" 327 | \n",
" Cory Jefferson | \n",
" PF | \n",
" Phoenix Suns | \n",
" 0.049709 | \n",
"
\n",
" \n",
" 326 | \n",
" Jordan McRae | \n",
" SG | \n",
" Phoenix Suns | \n",
" 0.049709 | \n",
"
\n",
" \n",
" 324 | \n",
" Orlando Johnson | \n",
" SG | \n",
" Phoenix Suns | \n",
" 0.055722 | \n",
"
\n",
" \n",
" 325 | \n",
" Phil Pressey | \n",
" PG | \n",
" Phoenix Suns | \n",
" 0.055722 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 131 | \n",
" Dwight Howard | \n",
" C | \n",
" Houston Rockets | \n",
" 22.359364 | \n",
"
\n",
" \n",
" 255 | \n",
" Carmelo Anthony | \n",
" SF | \n",
" New York Knicks | \n",
" 22.875000 | \n",
"
\n",
" \n",
" 72 | \n",
" LeBron James | \n",
" SF | \n",
" Cleveland Cavaliers | \n",
" 22.970500 | \n",
"
\n",
" \n",
" 29 | \n",
" Joe Johnson | \n",
" SF | \n",
" Brooklyn Nets | \n",
" 24.894863 | \n",
"
\n",
" \n",
" 169 | \n",
" Kobe Bryant | \n",
" SF | \n",
" Los Angeles Lakers | \n",
" 25.000000 | \n",
"
\n",
" \n",
"
\n",
"
417 rows × 4 columns
\n",
"
"
],
"text/plain": [
" PLAYER POSITION TEAM SALARY\n",
"267 Thanasis Antetokounmpo SF New York Knicks 0.030888\n",
"327 Cory Jefferson PF Phoenix Suns 0.049709\n",
"326 Jordan McRae SG Phoenix Suns 0.049709\n",
"324 Orlando Johnson SG Phoenix Suns 0.055722\n",
"325 Phil Pressey PG Phoenix Suns 0.055722\n",
".. ... ... ... ...\n",
"131 Dwight Howard C Houston Rockets 22.359364\n",
"255 Carmelo Anthony SF New York Knicks 22.875000\n",
"72 LeBron James SF Cleveland Cavaliers 22.970500\n",
"29 Joe Johnson SF Brooklyn Nets 24.894863\n",
"169 Kobe Bryant SF Los Angeles Lakers 25.000000\n",
"\n",
"[417 rows x 4 columns]"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nba = nba_salaries.rename(columns={\"'15-'16 SALARY\": 'SALARY'})\n",
"nba.sort_values('SALARY')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"These figures are somewhat difficult to compare as some of these players changed teams during the season and received salaries from more than one team; only the salary from the last team appears in the table. Point Guard Phil Pressey, for example, moved from Philadelphia to Phoenix during the year, and might be moving yet again to the Golden State Warriors. \n",
"\n",
"The CNN report is about the other end of the salary scale – the players who are among the highest paid in the world. \n",
"\n",
"To order the rows of the table in *decreasing* order of salary, we must use `sort` with the option `ascending=False`."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"\n",
"\n",
"
\n",
" \n",
" \n",
" | \n",
" PLAYER | \n",
" POSITION | \n",
" TEAM | \n",
" SALARY | \n",
"
\n",
" \n",
" \n",
" \n",
" 169 | \n",
" Kobe Bryant | \n",
" SF | \n",
" Los Angeles Lakers | \n",
" 25.000000 | \n",
"
\n",
" \n",
" 29 | \n",
" Joe Johnson | \n",
" SF | \n",
" Brooklyn Nets | \n",
" 24.894863 | \n",
"
\n",
" \n",
" 72 | \n",
" LeBron James | \n",
" SF | \n",
" Cleveland Cavaliers | \n",
" 22.970500 | \n",
"
\n",
" \n",
" 255 | \n",
" Carmelo Anthony | \n",
" SF | \n",
" New York Knicks | \n",
" 22.875000 | \n",
"
\n",
" \n",
" 131 | \n",
" Dwight Howard | \n",
" C | \n",
" Houston Rockets | \n",
" 22.359364 | \n",
"
\n",
" \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
" ... | \n",
"
\n",
" \n",
" 200 | \n",
" Elliot Williams | \n",
" SG | \n",
" Memphis Grizzlies | \n",
" 0.055722 | \n",
"
\n",
" \n",
" 324 | \n",
" Orlando Johnson | \n",
" SG | \n",
" Phoenix Suns | \n",
" 0.055722 | \n",
"
\n",
" \n",
" 327 | \n",
" Cory Jefferson | \n",
" PF | \n",
" Phoenix Suns | \n",
" 0.049709 | \n",
"
\n",
" \n",
" 326 | \n",
" Jordan McRae | \n",
" SG | \n",
" Phoenix Suns | \n",
" 0.049709 | \n",
"
\n",
" \n",
" 267 | \n",
" Thanasis Antetokounmpo | \n",
" SF | \n",
" New York Knicks | \n",
" 0.030888 | \n",
"
\n",
" \n",
"
\n",
"
417 rows × 4 columns
\n",
"
"
],
"text/plain": [
" PLAYER POSITION TEAM SALARY\n",
"169 Kobe Bryant SF Los Angeles Lakers 25.000000\n",
"29 Joe Johnson SF Brooklyn Nets 24.894863\n",
"72 LeBron James SF Cleveland Cavaliers 22.970500\n",
"255 Carmelo Anthony SF New York Knicks 22.875000\n",
"131 Dwight Howard C Houston Rockets 22.359364\n",
".. ... ... ... ...\n",
"200 Elliot Williams SG Memphis Grizzlies 0.055722\n",
"324 Orlando Johnson SG Phoenix Suns 0.055722\n",
"327 Cory Jefferson PF Phoenix Suns 0.049709\n",
"326 Jordan McRae SG Phoenix Suns 0.049709\n",
"267 Thanasis Antetokounmpo SF New York Knicks 0.030888\n",
"\n",
"[417 rows x 4 columns]"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"nba.sort_values('SALARY', ascending=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Kobe Bryant, in his final season with the Lakers, was the highest paid at a salary of $\\$25$ million. Notice that the MVP Stephen Curry doesn't appear among the top 10. He is quite a bit further down the list, as we will see later."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Named Arguments\n",
"\n",
"The `descending=True` portion of this call expression is called a *named argument*. When a function or method is called, each argument has both a position and a name. Both are evident from the help text of a function or method."
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Help on method sort_values in module pandas.core.frame:\n",
"\n",
"sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key: Union[Callable[[ForwardRef('Series')], Union[ForwardRef('Series'), ~AnyArrayLike]], NoneType] = None) method of pandas.core.frame.DataFrame instance\n",
" Sort by the values along either axis.\n",
" \n",
" Parameters\n",
" ----------\n",
" by : str or list of str\n",
" Name or list of names to sort by.\n",
" \n",
" - if `axis` is 0 or `'index'` then `by` may contain index\n",
" levels and/or column labels.\n",
" - if `axis` is 1 or `'columns'` then `by` may contain column\n",
" levels and/or index labels.\n",
" \n",
" .. versionchanged:: 0.23.0\n",
" \n",
" Allow specifying index or column level names.\n",
" axis : {0 or 'index', 1 or 'columns'}, default 0\n",
" Axis to be sorted.\n",
" ascending : bool or list of bool, default True\n",
" Sort ascending vs. descending. Specify list for multiple sort\n",
" orders. If this is a list of bools, must match the length of\n",
" the by.\n",
" inplace : bool, default False\n",
" If True, perform operation in-place.\n",
" kind : {'quicksort', 'mergesort', 'heapsort'}, default 'quicksort'\n",
" Choice of sorting algorithm. See also ndarray.np.sort for more\n",
" information. `mergesort` is the only stable algorithm. For\n",
" DataFrames, this option is only applied when sorting on a single\n",
" column or label.\n",
" na_position : {'first', 'last'}, default 'last'\n",
" Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n",
" end.\n",
" ignore_index : bool, default False\n",
" If True, the resulting axis will be labeled 0, 1, …, n - 1.\n",
" \n",
" .. versionadded:: 1.0.0\n",
" \n",
" key : callable, optional\n",
" Apply the key function to the values\n",
" before sorting. This is similar to the `key` argument in the\n",
" builtin :meth:`sorted` function, with the notable difference that\n",
" this `key` function should be *vectorized*. It should expect a\n",
" ``Series`` and return a Series with the same shape as the input.\n",
" It will be applied to each column in `by` independently.\n",
" \n",
" .. versionadded:: 1.1.0\n",
" \n",
" Returns\n",
" -------\n",
" DataFrame or None\n",
" DataFrame with sorted values if inplace=False, None otherwise.\n",
" \n",
" See Also\n",
" --------\n",
" DataFrame.sort_index : Sort a DataFrame by the index.\n",
" Series.sort_values : Similar method for a Series.\n",
" \n",
" Examples\n",
" --------\n",
" >>> df = pd.DataFrame({\n",
" ... 'col1': ['A', 'A', 'B', np.nan, 'D', 'C'],\n",
" ... 'col2': [2, 1, 9, 8, 7, 4],\n",
" ... 'col3': [0, 1, 9, 4, 2, 3],\n",
" ... 'col4': ['a', 'B', 'c', 'D', 'e', 'F']\n",
" ... })\n",
" >>> df\n",
" col1 col2 col3 col4\n",
" 0 A 2 0 a\n",
" 1 A 1 1 B\n",
" 2 B 9 9 c\n",
" 3 NaN 8 4 D\n",
" 4 D 7 2 e\n",
" 5 C 4 3 F\n",
" \n",
" Sort by col1\n",
" \n",
" >>> df.sort_values(by=['col1'])\n",
" col1 col2 col3 col4\n",
" 0 A 2 0 a\n",
" 1 A 1 1 B\n",
" 2 B 9 9 c\n",
" 5 C 4 3 F\n",
" 4 D 7 2 e\n",
" 3 NaN 8 4 D\n",
" \n",
" Sort by multiple columns\n",
" \n",
" >>> df.sort_values(by=['col1', 'col2'])\n",
" col1 col2 col3 col4\n",
" 1 A 1 1 B\n",
" 0 A 2 0 a\n",
" 2 B 9 9 c\n",
" 5 C 4 3 F\n",
" 4 D 7 2 e\n",
" 3 NaN 8 4 D\n",
" \n",
" Sort Descending\n",
" \n",
" >>> df.sort_values(by='col1', ascending=False)\n",
" col1 col2 col3 col4\n",
" 4 D 7 2 e\n",
" 5 C 4 3 F\n",
" 2 B 9 9 c\n",
" 0 A 2 0 a\n",
" 1 A 1 1 B\n",
" 3 NaN 8 4 D\n",
" \n",
" Putting NAs first\n",
" \n",
" >>> df.sort_values(by='col1', ascending=False, na_position='first')\n",
" col1 col2 col3 col4\n",
" 3 NaN 8 4 D\n",
" 4 D 7 2 e\n",
" 5 C 4 3 F\n",
" 2 B 9 9 c\n",
" 0 A 2 0 a\n",
" 1 A 1 1 B\n",
" \n",
" Sorting with a key function\n",
" \n",
" >>> df.sort_values(by='col4', key=lambda col: col.str.lower())\n",
" col1 col2 col3 col4\n",
" 0 A 2 0 a\n",
" 1 A 1 1 B\n",
" 2 B 9 9 c\n",
" 3 NaN 8 4 D\n",
" 4 D 7 2 e\n",
" 5 C 4 3 F\n",
"\n"
]
}
],
"source": [
"help(nba.sort_values)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"At the very top of this `help` text, the *signature* of the `sort_value` method appears:\n",
"\n",
" sort_value(column_or_label, descending=False, distinct=False)\n",
" \n",
"This describes the positions, names, and default values of the three arguments to `sort_value`. When calling this method, you can use either positional arguments or named arguments, so the following three calls do exactly the same thing.\n",
"\n",
" sort_value('SALARY', True)\n",
" sort_value('SALARY', ascending=False)\n",
" sort_value(column_or_label='SALARY', ascending=False)\n",
" \n",
"When an argument is simply `True` or `False`, it's a useful convention to include the argument name so that it's more obvious what the argument value means."
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}