{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "tags": [
     "remove_input"
    ]
   },
   "outputs": [],
   "source": [
    "path_data = '../../data/'\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "plt.style.use('fivethirtyeight')\n",
    "\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "tags": [
     "remove_input"
    ]
   },
   "outputs": [],
   "source": [
    "def population(prior_prob_disease):\n",
    "    n_d = int(prior_prob_disease*100000)\n",
    "    n_nd = 100000 - n_d\n",
    "    n_pos_d = int(0.99*n_d)\n",
    "    n_neg_d = n_d - n_pos_d\n",
    "    n_pos_nd = int(0.005*n_nd)\n",
    "    n_neg_nd = n_nd - n_pos_nd\n",
    "    condition = np.array(['Disease']*n_d + ['No Disease']*n_nd)\n",
    "    d_test = np.array(['Positive']*n_pos_d + ['Negative']*n_neg_d)\n",
    "    nd_test = np.array(['Positive']*n_pos_nd + ['Negative']*n_neg_nd)\n",
    "    test = np.append(d_test, nd_test)\n",
    "    t = pd.DataFrame(\n",
    "        {'True Condition':condition,\n",
    "        'Test Result':test}\n",
    "    )\n",
    "    return t"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Making Decisions\n",
    "A primary use of Bayes' Rule is to make decisions based on incomplete information, incorporating new information as it comes in. This section points out the importance of keeping your assumptions in mind as you make decisions."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Many medical tests for diseases return Positive or Negative results. A Positive result means that according to the test, the patient has the disease. A Negative result means the test concludes that the patient doesn't have the disease. \n",
    "\n",
    "Medical tests are carefully designed to be very accurate. But few tests are accurate 100% of the time. Almost all tests make errors of two kinds:\n",
    "\n",
    "- A **false positive** is an error in which the test concludes Positive but the patient doesn't have the disease.\n",
    "\n",
    "- A **false negative** is an error in which the test concludes Negative but the patient does have the disease.\n",
    "\n",
    "These errors can affect people's decisions. False positives can cause anxiety and unnecessary treatment (which in some cases is expensive or dangerous). False negatives can have even more serious consequences if the patient doesn't receive treatment because of their Negative test result."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A Test for a Rare Disease\n",
    "Suppose there is a large population and a disease that strikes a tiny proportion of the population. The tree diagram below summarizes information about such a disease and about a medical test for it.\n",
    "\n",
    ""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Overall, only 4 in 1000 of the population has the disease. The test is quite accurate: it has a very small false positive rate of 5 in 1000, and a somewhat larger (though still small) false negative rate of 1 in 100.\n",
    "\n",
    "Individuals might or might not know whether they have the disease; typically, people get tested to find out whether they have it.\n",
    "\n",
    "So **suppose a person is picked at random from the population** and tested. If the test result is Positive, how would you classify them: Disease, or No disease?\n",
    "\n",
    "We can answer this by applying Bayes' Rule and using our \"more likely than not\" classifier. Given that the person has tested Positive, the chance that he or she has the disease is the proportion in the top branch, relative to the total proportion in the Test Positive branches."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.44295302013422816"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "(0.004 * 0.99)/(0.004 * 0.99  +  0.996*0.005 )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given that the person has tested Positive, the chance that he or she has the disease is about 44%. So we will classify them as: No disease.\n",
    "\n",
    "This is a strange conclusion. We have a pretty accurate test, and a person who has tested Positive, and our classification is ... that they **don't** have the disease? That doesn't seem to make any sense.\n",
    "\n",
    "When faced with a disturbing answer, the first thing to do is to check the calculations. The arithmetic above is correct. Let's see if we can get the same answer in a different way.\n",
    "\n",
    "The function `population` returns a table of outcomes for 100,000 patients, with columns that show the `True Condition` and `Test Result`. The test is the same as the one described in the tree. But the proportion who have the disease is an argument to the function.\n",
    "\n",
    "We will call `population` with 0.004 as the argument, and then pivot to cross-classify each of the 100,000 people."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | Test_Result\n",
       " | Negative\n",
       " | Positive\n",
       " | 
\n",
       "    \n",
       "      | True Condition\n",
       " | \n",
       " | \n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | Disease\n",
       " | 4\n",
       " | 396\n",
       " | 
\n",
       "    \n",
       "      | No Disease\n",
       " | 99102\n",
       " | 498\n",
       " | 
\n",
       "  \n",
       "
\n",
       "
\n",
       "\n",
       "
\n",
       "  \n",
       "    \n",
       "      | Test_Result\n",
       " | Negative\n",
       " | Positive\n",
       " | 
\n",
       "    \n",
       "      | True Condition\n",
       " | \n",
       " | \n",
       " | 
\n",
       "  \n",
       "  \n",
       "    \n",
       "      | Disease\n",
       " | 50\n",
       " | 4950\n",
       " | 
\n",
       "    \n",
       "      | No Disease\n",
       " | 94525\n",
       " | 475\n",
       " | 
\n",
       "  \n",
       "
\n",
       "