{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove_input" ] }, "outputs": [], "source": [ "path_data = '../../../../data/'\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import math\n", "from scipy import stats\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "plt.style.use('fivethirtyeight')\n", "\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "remove_input" ] }, "outputs": [], "source": [ "def r_scatter(r):\n", " plt.figure(figsize=(5,5))\n", " \"Generate a scatter plot with a correlation approximately r\"\n", " x = np.random.normal(0, 1, 1000)\n", " z = np.random.normal(0, 1, 1000)\n", " y = r*x + (np.sqrt(1-r**2))*z\n", " plt.scatter(x, y)\n", " plt.xlim(-4, 4)\n", " plt.ylim(-4, 4)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Correlation ###\n", "\n", "In this section we will develop a measure of how tightly clustered a scatter diagram is about a straight line. Formally, this is called measuring *linear association*." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The table `hybrid` contains data on hybrid passenger cars sold in the United States from 1997 to 2013. The data were adapted from the online data archive of [Prof. Larry Winner](http://www.stat.ufl.edu/%7Ewinner/) of the University of Florida. The columns:\n", "\n", "- `vehicle`: model of the car\n", "- `year`: year of manufacture\n", "- `msrp`: manufacturer's suggested retail price in 2013 dollars\n", "- `acceleration`: acceleration rate in km per hour per second\n", "- `mpg`: fuel econonmy in miles per gallon\n", "- `class`: the model's class." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "hybrid = pd.read_csv(path_data + 'hybrid.csv')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | vehicle | \n", "year | \n", "msrp | \n", "acceleration | \n", "mpg | \n", "class | \n", "
---|---|---|---|---|---|---|
0 | \n", "Prius (1st Gen) | \n", "1997 | \n", "24509.74 | \n", "7.46 | \n", "41.26 | \n", "Compact | \n", "
1 | \n", "Tino | \n", "2000 | \n", "35354.97 | \n", "8.20 | \n", "54.10 | \n", "Compact | \n", "
2 | \n", "Prius (2nd Gen) | \n", "2000 | \n", "26832.25 | \n", "7.97 | \n", "45.23 | \n", "Compact | \n", "
3 | \n", "Insight | \n", "2000 | \n", "18936.41 | \n", "9.52 | \n", "53.00 | \n", "Two Seater | \n", "
4 | \n", "Civic (1st Gen) | \n", "2001 | \n", "25833.38 | \n", "7.04 | \n", "47.04 | \n", "Compact | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
148 | \n", "S400 | \n", "2013 | \n", "92350.00 | \n", "13.89 | \n", "21.00 | \n", "Large | \n", "
149 | \n", "Prius Plug-in | \n", "2013 | \n", "32000.00 | \n", "9.17 | \n", "50.00 | \n", "Midsize | \n", "
150 | \n", "C-Max Energi Plug-in | \n", "2013 | \n", "32950.00 | \n", "11.76 | \n", "43.00 | \n", "Midsize | \n", "
151 | \n", "Fusion Energi Plug-in | \n", "2013 | \n", "38700.00 | \n", "11.76 | \n", "43.00 | \n", "Midsize | \n", "
152 | \n", "Chevrolet Volt | \n", "2013 | \n", "39145.00 | \n", "11.11 | \n", "37.00 | \n", "Compact | \n", "
153 rows × 6 columns
\n", "