In [1]:
path_data = '../../data/'

import pandas as pd
import numpy as np

# Sorting Rows

"The NBA is the highest paying professional sports league in the world," [reported CNN](http://edition.cnn.com/2015/12/04/sport/gallery/highest-paid-nba-players/) in March 2016. The table `nba_salaries` contains the salaries of all National Basketball Association players in 2015-2016.

Each row represents one player. The columns are:

| **Column Label**   | Description                                         |
|--------------------|-----------------------------------------------------|
| `PLAYER`           | Player's name                                       |
| `POSITION`         | Player's position on team                           |
| `TEAM`             | Team name                                           |
|`'15-'16 SALARY`    | Player's salary in 2015-2016, in millions of dollars|
 
The code for the positions is PG (Point Guard), SG (Shooting Guard), PF (Power Forward), SF (Small Forward), and C (Center). But what follows doesn't involve details about how basketball is played.

The first row shows that Paul Millsap, Power Forward for the Atlanta Hawks, had a salary of almost $\$18.7$ million in 2015-2016.

In [2]:
# This table can be found online: https://www.statcrunch.com/app/index.php?dataid=1843341
nba_salaries = pd.read_csv(path_data + 'nba_salaries.csv')
nba_salaries

Unnamed: 0,PLAYER,POSITION,TEAM,'15-'16 SALARY
0,Paul Millsap,PF,Atlanta Hawks,18.671659
1,Al Horford,C,Atlanta Hawks,12.000000
2,Tiago Splitter,C,Atlanta Hawks,9.756250
3,Jeff Teague,PG,Atlanta Hawks,8.000000
4,Kyle Korver,SG,Atlanta Hawks,5.746479
...,...,...,...,...
412,Gary Neal,PG,Washington Wizards,2.139000
413,DeJuan Blair,C,Washington Wizards,2.000000
414,Kelly Oubre Jr.,SF,Washington Wizards,1.920240
415,Garrett Temple,SG,Washington Wizards,1.100602


The table contains 417 rows, one for each player. Only 10 of the rows are displayed. The `show` method allows us to specify the number of rows, with the default (no specification) being all the rows of the table.

In [3]:
nba_salaries.head(3)

Unnamed: 0,PLAYER,POSITION,TEAM,'15-'16 SALARY
0,Paul Millsap,PF,Atlanta Hawks,18.671659
1,Al Horford,C,Atlanta Hawks,12.0
2,Tiago Splitter,C,Atlanta Hawks,9.75625


Glance through about 20 rows or so, and you will see that the rows are in alphabetical order by team name. It's also possible to list the same rows in alphabetical order by player name using the `sort` method. The argument to `sort` is a column label or index.

In [4]:
nba_salaries.sort_values('PLAYER').head(5)

Unnamed: 0,PLAYER,POSITION,TEAM,'15-'16 SALARY
68,Aaron Brooks,PG,Chicago Bulls,2.25
291,Aaron Gordon,PF,Orlando Magic,4.17168
59,Aaron Harrison,SG,Charlotte Hornets,0.525093
235,Adreian Payne,PF,Minnesota Timberwolves,1.93884
1,Al Horford,C,Atlanta Hawks,12.0


To examine the players' salaries, it would be much more helpful if the data were ordered by salary.

To do this, we will first simplify the label of the column of salaries (just for convenience), and then sort by the new label `SALARY`. 

This arranges all the rows of the table in *increasing* order of salary, with the lowest salary appearing first. The output is a new table with the same columns as the original but with the rows rearranged.

In [5]:
nba = nba_salaries.rename(columns={"'15-'16 SALARY": 'SALARY'})
nba.sort_values('SALARY')

Unnamed: 0,PLAYER,POSITION,TEAM,SALARY
267,Thanasis Antetokounmpo,SF,New York Knicks,0.030888
327,Cory Jefferson,PF,Phoenix Suns,0.049709
326,Jordan McRae,SG,Phoenix Suns,0.049709
324,Orlando Johnson,SG,Phoenix Suns,0.055722
325,Phil Pressey,PG,Phoenix Suns,0.055722
...,...,...,...,...
131,Dwight Howard,C,Houston Rockets,22.359364
255,Carmelo Anthony,SF,New York Knicks,22.875000
72,LeBron James,SF,Cleveland Cavaliers,22.970500
29,Joe Johnson,SF,Brooklyn Nets,24.894863


These figures are somewhat difficult to compare as some of these players changed teams during the season and received salaries from more than one team; only the salary from the last team appears in the table. Point Guard Phil Pressey, for example, moved from Philadelphia to Phoenix during the year, and might be moving yet again to the Golden State Warriors. 

The CNN report is about the other end of the salary scale – the players who are among the highest paid in the world. 

To order the rows of the table in *decreasing* order of salary, we must use `sort` with the option `ascending=False`.

In [6]:
nba.sort_values('SALARY', ascending=False)

Unnamed: 0,PLAYER,POSITION,TEAM,SALARY
169,Kobe Bryant,SF,Los Angeles Lakers,25.000000
29,Joe Johnson,SF,Brooklyn Nets,24.894863
72,LeBron James,SF,Cleveland Cavaliers,22.970500
255,Carmelo Anthony,SF,New York Knicks,22.875000
131,Dwight Howard,C,Houston Rockets,22.359364
...,...,...,...,...
200,Elliot Williams,SG,Memphis Grizzlies,0.055722
324,Orlando Johnson,SG,Phoenix Suns,0.055722
327,Cory Jefferson,PF,Phoenix Suns,0.049709
326,Jordan McRae,SG,Phoenix Suns,0.049709


Kobe Bryant, in his final season with the Lakers, was the highest paid at a salary of $\$25$ million. Notice that the MVP Stephen Curry doesn't appear among the top 10. He is quite a bit further down the list, as we will see later.

## Named Arguments

The `descending=True` portion of this call expression is called a *named argument*. When a function or method is called, each argument has both a position and a name. Both are evident from the help text of a function or method.

In [7]:
help(nba.sort_values)

Help on method sort_values in module pandas.core.frame:

sort_values(by, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False, key: Union[Callable[[ForwardRef('Series')], Union[ForwardRef('Series'), ~AnyArrayLike]], NoneType] = None) method of pandas.core.frame.DataFrame instance
    Sort by the values along either axis.
    
    Parameters
    ----------
            by : str or list of str
                Name or list of names to sort by.
    
                - if `axis` is 0 or `'index'` then `by` may contain index
                  levels and/or column labels.
                - if `axis` is 1 or `'columns'` then `by` may contain column
                  levels and/or index labels.
    
                .. versionchanged:: 0.23.0
    
                   Allow specifying index or column level names.
    axis : {0 or 'index', 1 or 'columns'}, default 0
         Axis to be sorted.
    ascending : bool or list of bool, default True
         Sort 

At the very top of this `help` text, the *signature* of the `sort_value` method appears:

    sort_value(column_or_label, descending=False, distinct=False)
    
This describes the positions, names, and default values of the three arguments to `sort_value`. When calling this method, you can use either positional arguments or named arguments, so the following three calls do exactly the same thing.

    sort_value('SALARY', True)
    sort_value('SALARY', ascending=False)
    sort_value(column_or_label='SALARY', ascending=False)
    
When an argument is simply `True` or `False`, it's a useful convention to include the argument name so that it's more obvious what the argument value means.