# Establishing Causality
In the language developed earlier in the section, you can think of the people in
the S&V houses as the treatment group, and those in the Lambeth houses at the
control group. A crucial element in Snow’s analysis was that the people in the
two groups were comparable to each other, apart from the treatment.

In order to establish whether it was the water supply that was causing cholera,
Snow had to compare two groups that were similar to each other in all but one
aspect—their water supply. Only then would he be able to ascribe the differences
in their outcomes to the water supply. If the two groups had been different in
some other way as well, it would have been difficult to point the finger at the
water supply as the source of the disease.  For example, if the treatment group
consisted of factory workers and the control group did not, then differences
between the outcomes in the two groups could have been due to the water supply,
or to factory work, or both. The final picture would have been much more fuzzy.

Snow’s brilliance lay in identifying two groups that would make his comparison
clear. He had set out to establish a causal relation between contaminated water
and cholera infection, and to a great extent he succeeded, even though the
miasmatists ignored and even ridiculed him. Of course, Snow did not understand
the detailed mechanism by which humans contract cholera. That discovery was made
in 1883, when the German scientist Robert Koch isolated the *Vibrio cholerae*,
the bacterium that enters the human small intestine and causes cholera.

In fact the *Vibrio cholerae* had been identified in 1854 by Filippo Pacini in
Italy, just about when Snow was analyzing his data in London. Because of the
dominance of the miasmatists in Italy, Pacini’s discovery languished unknown.
But by the end of the 1800’s, the miasma brigade was in retreat. Subsequent
history has vindicated Pacini and John Snow. Snow’s methods led to the
development of the field of *epidemiology*, which is the study of the spread of
diseases.

**Confounding**

Let us now return to more modern times, armed with an important lesson that we
have learned along the way:

**In an observational study, if the treatment and control groups differ in ways
other than the treatment, it is difficult to make conclusions about causality.**

An underlying difference between the two groups (other than the treatment) is
called a *confounding factor*, because it might confound you (that is, mess you
up) when you try to reach a conclusion.

**Example: Coffee and lung cancer.** Studies in the 1960’s showed that coffee
drinkers had higher rates of lung cancer than those who did not drink coffee.
Because of this, some people identified coffee as a cause of lung cancer. But
coffee does not cause lung cancer. The analysis contained a confounding factor—smoking. In those days, coffee drinkers were also likely to have been smokers,
and smoking does cause lung cancer. Coffee drinking was associated with lung
cancer, but it did not cause the disease.

Confounding factors are common in observational studies. Good studies take great
care to reduce confounding and to account for its effects.