cmpsupport
/
data8


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128
							---
redirect_from:
  - "/chapters/16/1/regression-model"
interact_link: content/chapters/16/1/Regression_Model.ipynb
kernel_name: python3
has_widgets: false
title: |-
  A Regression Model
prev_page:
  url: /chapters/16/Inference_for_Regression.html
  title: |-
    Inference for Regression
next_page:
  url: /chapters/16/2/Inference_for_the_True_Slope.html
  title: |-
    Inference for the True Slope
comment: "***PROGRAMMATICALLY GENERATED, DO NOT EDIT. SEE ORIGINAL FILES IN /content***"
---
<div class="jb_cell tag_remove_input">

<div class="cell border-box-sizing code_cell rendered">

</div>
</div>

<div class="jb_cell tag_remove_input">

<div class="cell border-box-sizing code_cell rendered">

</div>
</div>

<div class="jb_cell">

<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<h3 id="A-Regression-Model">A Regression Model<a class="anchor-link" href="#A-Regression-Model"> </a></h3><p>In brief, such models say that the underlying relation between the two variables is perfectly linear; this straight line is the <em>signal</em> that we would like to identify. However, we are not able to see the line clearly. What we see are points that are scattered around the line. In each of the points, the signal has been contaminated by <em>random noise</em>. Our inferential goal, therefore, is to separate the signal from the noise.</p>
<p>In greater detail, the regression model specifies that the points in the scatter plot are generated at random as follows.</p>
<ul>
<li>The relation between $x$ and $y$ is perfectly linear. We cannot see this "true line" but it exists.</li>
<li>The scatter plot is created by taking points on the line and pushing them off the line vertically, either above or below, as follows:<ul>
<li>For each $x$, find the corresponding point on the true line (that's the signal), and then generate the noise or error.</li>
<li>The errors are drawn at random with replacement from a population of errors that has a normal distribution with mean 0.</li>
<li>Create a point whose horizontal coordinate is $x$ and whose vertical coordinate is "the height of the true line at $x$, plus the error".</li>
</ul>
</li>
<li>Finally, erase the true line from the scatter, and display just the points created.</li>
</ul>

</div>
</div>
</div>
</div>

<div class="jb_cell">

<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>Based on this scatter plot, how should we estimate the true line? The best line that we can put through a scatter plot is the regression line. So the regression line is a natural estimate of the true line.</p>
<p>The simulation below shows how close the regression line is to the true line. The first panel shows how the scatter plot is generated from the true line. The second shows the scatter plot that we see. The third shows the regression line through the plot. The fourth shows both the regression line and the true line.</p>
<p>To run the simulation, call the function <code>draw_and_compare</code> with three arguments: the slope of the true line, the intercept of the true line, and the sample size.</p>
<p>Run the simulation a few times, with different values for the slope and intercept of the true line, and varying sample sizes. Because all the points are generated according to the model, you will see that the regression line is a good estimate of the true line if the sample size is moderately large.</p>

</div>
</div>
</div>
</div>

<div class="jb_cell tag_remove_input">

<div class="cell border-box-sizing code_cell rendered">

</div>
</div>

<div class="jb_cell">

<div class="cell border-box-sizing code_cell rendered">
<div class="input">

<div class="inner_cell">
    <div class="input_area">
<div class=" highlight hl-ipython3"><pre><span></span><span class="c1"># The true line,</span>
<span class="c1"># the points created,</span>
<span class="c1"># and our estimate of the true line.</span>
<span class="c1"># Arguments: true slope, true intercept, number of points</span>

<span class="n">draw_and_compare</span><span class="p">(</span><span class="mi">4</span><span class="p">,</span> <span class="o">-</span><span class="mi">5</span><span class="p">,</span> <span class="mi">10</span><span class="p">)</span>
</pre></div>

    </div>
</div>
</div>

<div class="output_wrapper">
<div class="output">

<div class="jb_output_wrapper }}">
<div class="output_area">


<div class="output_png output_subarea ">
<img src="../../../images/chapters/16/1/Regression_Model_5_0.png"
>
</div>

</div>
</div>
</div>
</div>

</div>
</div>

<div class="jb_cell">

<div class="cell border-box-sizing text_cell rendered"><div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>In reality, of course, we will never see the true line. What the simulation shows that if the regression model looks plausible, and if we have a large sample, then the regression line is a good approximation to the true line.</p>

</div>
</div>
</div>
</div>