{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove_input" ] }, "outputs": [], "source": [ "path_data = '../../data/'\n", "\n", "import pandas as pd\n", "import numpy as np" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Sorting Rows\n", "\n", "\"The NBA is the highest paying professional sports league in the world,\" [reported CNN](http://edition.cnn.com/2015/12/04/sport/gallery/highest-paid-nba-players/) in March 2016. The table `nba_salaries` contains the salaries of all National Basketball Association players in 2015-2016.\n", "\n", "Each row represents one player. The columns are:\n", "\n", "| **Column Label** | Description |\n", "|--------------------|-----------------------------------------------------|\n", "| `PLAYER` | Player's name |\n", "| `POSITION` | Player's position on team |\n", "| `TEAM` | Team name |\n", "|`'15-'16 SALARY` | Player's salary in 2015-2016, in millions of dollars|\n", " \n", "The code for the positions is PG (Point Guard), SG (Shooting Guard), PF (Power Forward), SF (Small Forward), and C (Center). But what follows doesn't involve details about how basketball is played.\n", "\n", "The first row shows that Paul Millsap, Power Forward for the Atlanta Hawks, had a salary of almost $\\$18.7$ million in 2015-2016." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYERPOSITIONTEAM'15-'16 SALARY
0Paul MillsapPFAtlanta Hawks18.671659
1Al HorfordCAtlanta Hawks12.000000
2Tiago SplitterCAtlanta Hawks9.756250
3Jeff TeaguePGAtlanta Hawks8.000000
4Kyle KorverSGAtlanta Hawks5.746479
...............
412Gary NealPGWashington Wizards2.139000
413DeJuan BlairCWashington Wizards2.000000
414Kelly Oubre Jr.SFWashington Wizards1.920240
415Garrett TempleSGWashington Wizards1.100602
416Jarell EddieSGWashington Wizards0.561716
\n", "

417 rows × 4 columns

\n", "
" ], "text/plain": [ " PLAYER POSITION TEAM '15-'16 SALARY\n", "0 Paul Millsap PF Atlanta Hawks 18.671659\n", "1 Al Horford C Atlanta Hawks 12.000000\n", "2 Tiago Splitter C Atlanta Hawks 9.756250\n", "3 Jeff Teague PG Atlanta Hawks 8.000000\n", "4 Kyle Korver SG Atlanta Hawks 5.746479\n", ".. ... ... ... ...\n", "412 Gary Neal PG Washington Wizards 2.139000\n", "413 DeJuan Blair C Washington Wizards 2.000000\n", "414 Kelly Oubre Jr. SF Washington Wizards 1.920240\n", "415 Garrett Temple SG Washington Wizards 1.100602\n", "416 Jarell Eddie SG Washington Wizards 0.561716\n", "\n", "[417 rows x 4 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# This table can be found online: https://www.statcrunch.com/app/index.php?dataid=1843341\n", "nba_salaries = pd.read_csv(path_data + 'nba_salaries.csv')\n", "nba_salaries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The table contains 417 rows, one for each player. Only 10 of the rows are displayed. The `show` method allows us to specify the number of rows, with the default (no specification) being all the rows of the table." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYERPOSITIONTEAM'15-'16 SALARY
0Paul MillsapPFAtlanta Hawks18.671659
1Al HorfordCAtlanta Hawks12.000000
2Tiago SplitterCAtlanta Hawks9.756250
\n", "
" ], "text/plain": [ " PLAYER POSITION TEAM '15-'16 SALARY\n", "0 Paul Millsap PF Atlanta Hawks 18.671659\n", "1 Al Horford C Atlanta Hawks 12.000000\n", "2 Tiago Splitter C Atlanta Hawks 9.756250" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nba_salaries.head(3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Glance through about 20 rows or so, and you will see that the rows are in alphabetical order by team name. It's also possible to list the same rows in alphabetical order by player name using the `sort` method. The argument to `sort` is a column label or index." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYERPOSITIONTEAM'15-'16 SALARY
68Aaron BrooksPGChicago Bulls2.250000
291Aaron GordonPFOrlando Magic4.171680
59Aaron HarrisonSGCharlotte Hornets0.525093
235Adreian PaynePFMinnesota Timberwolves1.938840
1Al HorfordCAtlanta Hawks12.000000
\n", "
" ], "text/plain": [ " PLAYER POSITION TEAM '15-'16 SALARY\n", "68 Aaron Brooks PG Chicago Bulls 2.250000\n", "291 Aaron Gordon PF Orlando Magic 4.171680\n", "59 Aaron Harrison SG Charlotte Hornets 0.525093\n", "235 Adreian Payne PF Minnesota Timberwolves 1.938840\n", "1 Al Horford C Atlanta Hawks 12.000000" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nba_salaries.sort_values('PLAYER').head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To examine the players' salaries, it would be much more helpful if the data were ordered by salary.\n", "\n", "To do this, we will first simplify the label of the column of salaries (just for convenience), and then sort by the new label `SALARY`. \n", "\n", "This arranges all the rows of the table in *increasing* order of salary, with the lowest salary appearing first. The output is a new table with the same columns as the original but with the rows rearranged." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYERPOSITIONTEAMSALARY
267Thanasis AntetokounmpoSFNew York Knicks0.030888
327Cory JeffersonPFPhoenix Suns0.049709
326Jordan McRaeSGPhoenix Suns0.049709
324Orlando JohnsonSGPhoenix Suns0.055722
325Phil PresseyPGPhoenix Suns0.055722
...............
131Dwight HowardCHouston Rockets22.359364
255Carmelo AnthonySFNew York Knicks22.875000
72LeBron JamesSFCleveland Cavaliers22.970500
29Joe JohnsonSFBrooklyn Nets24.894863
169Kobe BryantSFLos Angeles Lakers25.000000
\n", "

417 rows × 4 columns

\n", "
" ], "text/plain": [ " PLAYER POSITION TEAM SALARY\n", "267 Thanasis Antetokounmpo SF New York Knicks 0.030888\n", "327 Cory Jefferson PF Phoenix Suns 0.049709\n", "326 Jordan McRae SG Phoenix Suns 0.049709\n", "324 Orlando Johnson SG Phoenix Suns 0.055722\n", "325 Phil Pressey PG Phoenix Suns 0.055722\n", ".. ... ... ... ...\n", "131 Dwight Howard C Houston Rockets 22.359364\n", "255 Carmelo Anthony SF New York Knicks 22.875000\n", "72 LeBron James SF Cleveland Cavaliers 22.970500\n", "29 Joe Johnson SF Brooklyn Nets 24.894863\n", "169 Kobe Bryant SF Los Angeles Lakers 25.000000\n", "\n", "[417 rows x 4 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nba = nba_salaries.rename(columns={\"'15-'16 SALARY\": 'SALARY'})\n", "nba.sort_values('SALARY')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "These figures are somewhat difficult to compare as some of these players changed teams during the season and received salaries from more than one team; only the salary from the last team appears in the table. Point Guard Phil Pressey, for example, moved from Philadelphia to Phoenix during the year, and might be moving yet again to the Golden State Warriors. \n", "\n", "The CNN report is about the other end of the salary scale – the players who are among the highest paid in the world. \n", "\n", "To order the rows of the table in *decreasing* order of salary, we must use `sort` with the option `ascending=False`." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PLAYERPOSITIONTEAMSALARY
169Kobe BryantSFLos Angeles Lakers25.000000
29Joe JohnsonSFBrooklyn Nets24.894863
72LeBron JamesSFCleveland Cavaliers22.970500
255Carmelo AnthonySFNew York Knicks22.875000
131Dwight HowardCHouston Rockets22.359364
...............
200Elliot WilliamsSGMemphis Grizzlies0.055722
324Orlando JohnsonSGPhoenix Suns0.055722
327Cory JeffersonPFPhoenix Suns0.049709
326Jordan McRaeSGPhoenix Suns0.049709
267Thanasis AntetokounmpoSFNew York Knicks0.030888
\n", "

417 rows × 4 columns

\n", "
" ], "text/plain": [ " PLAYER POSITION TEAM SALARY\n", "169 Kobe Bryant SF Los Angeles Lakers 25.000000\n", "29 Joe Johnson SF Brooklyn Nets 24.894863\n", "72 LeBron James SF Cleveland Cavaliers 22.970500\n", "255 Carmelo Anthony SF New York Knicks 22.875000\n", "131 Dwight Howard C Houston Rockets 22.359364\n", ".. ... ... ... ...\n", "200 Elliot Williams SG Memphis Grizzlies 0.055722\n", "324 Orlando Johnson SG Phoenix Suns 0.055722\n", "327 Cory Jefferson PF Phoenix Suns 0.049709\n", "326 Jordan McRae SG Phoenix Suns 0.049709\n", "267 Thanasis Antetokounmpo SF New York Knicks 0.030888\n", "\n", "[417 rows x 4 columns]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "nba.sort_values('SALARY', ascending=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Kobe Bryant, in his final season with the Lakers, was the highest paid at a salary of $\\$25$ million. Notice that the MVP Stephen Curry doesn't appear among the top 10. He is quite a bit further down the list, as we will see later." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Named Arguments\n", "\n", "The `descending=True` portion of this call expression is called a *named argument*. When a function or method is called, each argument has both a position and a name. Both are evident from the help text of a function or method." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Help on method sort_values in module pandas.core.frame:\n", "\n", "sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key: Union[Callable[[ForwardRef('Series')], Union[ForwardRef('Series'), ~AnyArrayLike]], NoneType] = None) method of pandas.core.frame.DataFrame instance\n", " Sort by the values along either axis.\n", " \n", " Parameters\n", " ----------\n", " by : str or list of str\n", " Name or list of names to sort by.\n", " \n", " - if `axis` is 0 or `'index'` then `by` may contain index\n", " levels and/or column labels.\n", " - if `axis` is 1 or `'columns'` then `by` may contain column\n", " levels and/or index labels.\n", " \n", " .. versionchanged:: 0.23.0\n", " \n", " Allow specifying index or column level names.\n", " axis : {0 or 'index', 1 or 'columns'}, default 0\n", " Axis to be sorted.\n", " ascending : bool or list of bool, default True\n", " Sort ascending vs. descending. Specify list for multiple sort\n", " orders. If this is a list of bools, must match the length of\n", " the by.\n", " inplace : bool, default False\n", " If True, perform operation in-place.\n", " kind : {'quicksort', 'mergesort', 'heapsort'}, default 'quicksort'\n", " Choice of sorting algorithm. See also ndarray.np.sort for more\n", " information. `mergesort` is the only stable algorithm. For\n", " DataFrames, this option is only applied when sorting on a single\n", " column or label.\n", " na_position : {'first', 'last'}, default 'last'\n", " Puts NaNs at the beginning if `first`; `last` puts NaNs at the\n", " end.\n", " ignore_index : bool, default False\n", " If True, the resulting axis will be labeled 0, 1, …, n - 1.\n", " \n", " .. versionadded:: 1.0.0\n", " \n", " key : callable, optional\n", " Apply the key function to the values\n", " before sorting. This is similar to the `key` argument in the\n", " builtin :meth:`sorted` function, with the notable difference that\n", " this `key` function should be *vectorized*. It should expect a\n", " ``Series`` and return a Series with the same shape as the input.\n", " It will be applied to each column in `by` independently.\n", " \n", " .. versionadded:: 1.1.0\n", " \n", " Returns\n", " -------\n", " DataFrame or None\n", " DataFrame with sorted values if inplace=False, None otherwise.\n", " \n", " See Also\n", " --------\n", " DataFrame.sort_index : Sort a DataFrame by the index.\n", " Series.sort_values : Similar method for a Series.\n", " \n", " Examples\n", " --------\n", " >>> df = pd.DataFrame({\n", " ... 'col1': ['A', 'A', 'B', np.nan, 'D', 'C'],\n", " ... 'col2': [2, 1, 9, 8, 7, 4],\n", " ... 'col3': [0, 1, 9, 4, 2, 3],\n", " ... 'col4': ['a', 'B', 'c', 'D', 'e', 'F']\n", " ... })\n", " >>> df\n", " col1 col2 col3 col4\n", " 0 A 2 0 a\n", " 1 A 1 1 B\n", " 2 B 9 9 c\n", " 3 NaN 8 4 D\n", " 4 D 7 2 e\n", " 5 C 4 3 F\n", " \n", " Sort by col1\n", " \n", " >>> df.sort_values(by=['col1'])\n", " col1 col2 col3 col4\n", " 0 A 2 0 a\n", " 1 A 1 1 B\n", " 2 B 9 9 c\n", " 5 C 4 3 F\n", " 4 D 7 2 e\n", " 3 NaN 8 4 D\n", " \n", " Sort by multiple columns\n", " \n", " >>> df.sort_values(by=['col1', 'col2'])\n", " col1 col2 col3 col4\n", " 1 A 1 1 B\n", " 0 A 2 0 a\n", " 2 B 9 9 c\n", " 5 C 4 3 F\n", " 4 D 7 2 e\n", " 3 NaN 8 4 D\n", " \n", " Sort Descending\n", " \n", " >>> df.sort_values(by='col1', ascending=False)\n", " col1 col2 col3 col4\n", " 4 D 7 2 e\n", " 5 C 4 3 F\n", " 2 B 9 9 c\n", " 0 A 2 0 a\n", " 1 A 1 1 B\n", " 3 NaN 8 4 D\n", " \n", " Putting NAs first\n", " \n", " >>> df.sort_values(by='col1', ascending=False, na_position='first')\n", " col1 col2 col3 col4\n", " 3 NaN 8 4 D\n", " 4 D 7 2 e\n", " 5 C 4 3 F\n", " 2 B 9 9 c\n", " 0 A 2 0 a\n", " 1 A 1 1 B\n", " \n", " Sorting with a key function\n", " \n", " >>> df.sort_values(by='col4', key=lambda col: col.str.lower())\n", " col1 col2 col3 col4\n", " 0 A 2 0 a\n", " 1 A 1 1 B\n", " 2 B 9 9 c\n", " 3 NaN 8 4 D\n", " 4 D 7 2 e\n", " 5 C 4 3 F\n", "\n" ] } ], "source": [ "help(nba.sort_values)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "At the very top of this `help` text, the *signature* of the `sort_value` method appears:\n", "\n", " sort_value(column_or_label, descending=False, distinct=False)\n", " \n", "This describes the positions, names, and default values of the three arguments to `sort_value`. When calling this method, you can use either positional arguments or named arguments, so the following three calls do exactly the same thing.\n", "\n", " sort_value('SALARY', True)\n", " sort_value('SALARY', ascending=False)\n", " sort_value(column_or_label='SALARY', ascending=False)\n", " \n", "When an argument is simply `True` or `False`, it's a useful convention to include the argument name so that it's more obvious what the argument value means." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 2 }