{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": { "tags": [ "remove_input" ] }, "outputs": [], "source": [ "path_data = '../../data/'\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import math\n", "import scipy.stats as stats\n", "\n", "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "from mpl_toolkits.mplot3d import Axes3D\n", "plt.style.use('fivethirtyeight')\n", "\n", "import warnings\n", "warnings.filterwarnings('ignore')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Implementing the Classifier\n", "We are now ready to implement a $k$-nearest neighbor classifier based on multiple attributes. We have used only two attributes so far, for ease of visualization. But usually predictions will be based on many attributes. Here is an example that shows how multiple attributes can be better than pairs." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Banknote authentication\n", "\n", "This time we'll look at predicting whether a banknote (e.g., a \\$20 bill) is counterfeit or legitimate. Researchers have put together a data set for us, based on photographs of many individual banknotes: some counterfeit, some legitimate. They computed a few numbers from each image, using techniques that we won't worry about for this course. So, for each banknote, we know a few numbers that were computed from a photograph of it as well as its class (whether it is counterfeit or not). Let's load it into a table and take a look." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | WaveletVar | \n", "WaveletSkew | \n", "WaveletCurt | \n", "Entropy | \n", "Class | \n", "
---|---|---|---|---|---|
0 | \n", "3.62160 | \n", "8.66610 | \n", "-2.8073 | \n", "-0.44699 | \n", "0 | \n", "
1 | \n", "4.54590 | \n", "8.16740 | \n", "-2.4586 | \n", "-1.46210 | \n", "0 | \n", "
2 | \n", "3.86600 | \n", "-2.63830 | \n", "1.9242 | \n", "0.10645 | \n", "0 | \n", "
3 | \n", "3.45660 | \n", "9.52280 | \n", "-4.0112 | \n", "-3.59440 | \n", "0 | \n", "
4 | \n", "0.32924 | \n", "-4.45520 | \n", "4.5718 | \n", "-0.98880 | \n", "0 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1367 | \n", "0.40614 | \n", "1.34920 | \n", "-1.4501 | \n", "-0.55949 | \n", "1 | \n", "
1368 | \n", "-1.38870 | \n", "-4.87730 | \n", "6.4774 | \n", "0.34179 | \n", "1 | \n", "
1369 | \n", "-3.75030 | \n", "-13.45860 | \n", "17.5932 | \n", "-2.77710 | \n", "1 | \n", "
1370 | \n", "-3.56370 | \n", "-8.38270 | \n", "12.3930 | \n", "-1.28230 | \n", "1 | \n", "
1371 | \n", "-2.54190 | \n", "-0.65804 | \n", "2.6842 | \n", "1.19520 | \n", "1 | \n", "
1372 rows × 5 columns
\n", "\n", " | WaveletVar | \n", "WaveletSkew | \n", "WaveletCurt | \n", "Entropy | \n", "Class | \n", "Color | \n", "
---|---|---|---|---|---|---|
0 | \n", "3.62160 | \n", "8.66610 | \n", "-2.8073 | \n", "-0.44699 | \n", "0 | \n", "gold | \n", "
1 | \n", "4.54590 | \n", "8.16740 | \n", "-2.4586 | \n", "-1.46210 | \n", "0 | \n", "gold | \n", "
2 | \n", "3.86600 | \n", "-2.63830 | \n", "1.9242 | \n", "0.10645 | \n", "0 | \n", "gold | \n", "
3 | \n", "3.45660 | \n", "9.52280 | \n", "-4.0112 | \n", "-3.59440 | \n", "0 | \n", "gold | \n", "
4 | \n", "0.32924 | \n", "-4.45520 | \n", "4.5718 | \n", "-0.98880 | \n", "0 | \n", "gold | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1367 | \n", "0.40614 | \n", "1.34920 | \n", "-1.4501 | \n", "-0.55949 | \n", "1 | \n", "darkblue | \n", "
1368 | \n", "-1.38870 | \n", "-4.87730 | \n", "6.4774 | \n", "0.34179 | \n", "1 | \n", "darkblue | \n", "
1369 | \n", "-3.75030 | \n", "-13.45860 | \n", "17.5932 | \n", "-2.77710 | \n", "1 | \n", "darkblue | \n", "
1370 | \n", "-3.56370 | \n", "-8.38270 | \n", "12.3930 | \n", "-1.28230 | \n", "1 | \n", "darkblue | \n", "
1371 | \n", "-2.54190 | \n", "-0.65804 | \n", "2.6842 | \n", "1.19520 | \n", "1 | \n", "darkblue | \n", "
1372 rows × 6 columns
\n", "\n", " | Class | \n", "Alcohol | \n", "Malic Acid | \n", "Ash | \n", "Alcalinity of Ash | \n", "Magnesium | \n", "Total Phenols | \n", "Flavanoids | \n", "Nonflavanoid phenols | \n", "Proanthocyanins | \n", "Color Intensity | \n", "Hue | \n", "OD280/OD315 of diulted wines | \n", "Proline | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "14.23 | \n", "1.71 | \n", "2.43 | \n", "15.6 | \n", "127 | \n", "2.80 | \n", "3.06 | \n", "0.28 | \n", "2.29 | \n", "5.64 | \n", "1.04 | \n", "3.92 | \n", "1065 | \n", "
1 | \n", "1 | \n", "13.20 | \n", "1.78 | \n", "2.14 | \n", "11.2 | \n", "100 | \n", "2.65 | \n", "2.76 | \n", "0.26 | \n", "1.28 | \n", "4.38 | \n", "1.05 | \n", "3.40 | \n", "1050 | \n", "
2 | \n", "1 | \n", "13.16 | \n", "2.36 | \n", "2.67 | \n", "18.6 | \n", "101 | \n", "2.80 | \n", "3.24 | \n", "0.30 | \n", "2.81 | \n", "5.68 | \n", "1.03 | \n", "3.17 | \n", "1185 | \n", "
3 | \n", "1 | \n", "14.37 | \n", "1.95 | \n", "2.50 | \n", "16.8 | \n", "113 | \n", "3.85 | \n", "3.49 | \n", "0.24 | \n", "2.18 | \n", "7.80 | \n", "0.86 | \n", "3.45 | \n", "1480 | \n", "
4 | \n", "1 | \n", "13.24 | \n", "2.59 | \n", "2.87 | \n", "21.0 | \n", "118 | \n", "2.80 | \n", "2.69 | \n", "0.39 | \n", "1.82 | \n", "4.32 | \n", "1.04 | \n", "2.93 | \n", "735 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
173 | \n", "0 | \n", "13.71 | \n", "5.65 | \n", "2.45 | \n", "20.5 | \n", "95 | \n", "1.68 | \n", "0.61 | \n", "0.52 | \n", "1.06 | \n", "7.70 | \n", "0.64 | \n", "1.74 | \n", "740 | \n", "
174 | \n", "0 | \n", "13.40 | \n", "3.91 | \n", "2.48 | \n", "23.0 | \n", "102 | \n", "1.80 | \n", "0.75 | \n", "0.43 | \n", "1.41 | \n", "7.30 | \n", "0.70 | \n", "1.56 | \n", "750 | \n", "
175 | \n", "0 | \n", "13.27 | \n", "4.28 | \n", "2.26 | \n", "20.0 | \n", "120 | \n", "1.59 | \n", "0.69 | \n", "0.43 | \n", "1.35 | \n", "10.20 | \n", "0.59 | \n", "1.56 | \n", "835 | \n", "
176 | \n", "0 | \n", "13.17 | \n", "2.59 | \n", "2.37 | \n", "20.0 | \n", "120 | \n", "1.65 | \n", "0.68 | \n", "0.53 | \n", "1.46 | \n", "9.30 | \n", "0.60 | \n", "1.62 | \n", "840 | \n", "
177 | \n", "0 | \n", "14.13 | \n", "4.10 | \n", "2.74 | \n", "24.5 | \n", "96 | \n", "2.05 | \n", "0.76 | \n", "0.56 | \n", "1.35 | \n", "9.20 | \n", "0.61 | \n", "1.60 | \n", "560 | \n", "
178 rows × 14 columns
\n", "\n", " | Class | \n", "Alcohol | \n", "Malic Acid | \n", "Ash | \n", "Alcalinity of Ash | \n", "Magnesium | \n", "Total Phenols | \n", "Flavanoids | \n", "Nonflavanoid phenols | \n", "Proanthocyanins | \n", "Color Intensity | \n", "Hue | \n", "OD280/OD315 of diulted wines | \n", "Proline | \n", "Distance | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "14.23 | \n", "1.71 | \n", "2.43 | \n", "15.6 | \n", "127 | \n", "2.80 | \n", "3.06 | \n", "0.28 | \n", "2.29 | \n", "5.64 | \n", "1.04 | \n", "3.92 | \n", "1065 | \n", "0.000000 | \n", "
54 | \n", "1 | \n", "13.74 | \n", "1.67 | \n", "2.25 | \n", "16.4 | \n", "118 | \n", "2.60 | \n", "2.90 | \n", "0.21 | \n", "1.62 | \n", "5.85 | \n", "0.92 | \n", "3.20 | \n", "1060 | \n", "10.392805 | \n", "
45 | \n", "1 | \n", "14.21 | \n", "4.04 | \n", "2.44 | \n", "18.9 | \n", "111 | \n", "2.85 | \n", "2.65 | \n", "0.30 | \n", "1.25 | \n", "5.24 | \n", "0.87 | \n", "3.33 | \n", "1080 | \n", "22.340748 | \n", "
48 | \n", "1 | \n", "14.10 | \n", "2.02 | \n", "2.40 | \n", "18.8 | \n", "103 | \n", "2.75 | \n", "2.92 | \n", "0.32 | \n", "2.38 | \n", "6.20 | \n", "1.07 | \n", "2.75 | \n", "1060 | \n", "24.760232 | \n", "
46 | \n", "1 | \n", "14.38 | \n", "3.59 | \n", "2.28 | \n", "16.0 | \n", "102 | \n", "3.25 | \n", "3.17 | \n", "0.27 | \n", "2.19 | \n", "4.90 | \n", "1.04 | \n", "3.44 | \n", "1065 | \n", "25.094663 | \n", "