Discovering Causes from Correlations

In this lecture 8B, we study some real world cases which show how causes were discovered. One of the key ideas is that Correlation is NOT Causation. That is, we cannot learn about causes just by looking at the patterns of association within the observed data. Discovering causes requires going beyond the observations to the underlying real world structures and mechanisms which generate these observations.

In fact, one of the Main Problems we face in statistics and econometrics is “How to Discover Causes when all we observe is Correlations?” It will surprise students to know that conventional statistics has NO ANSWER to this problem. The standard techniques, currently in use in statistics and econometrics use the following Naïve Approach. We start with a GUESS that X causes Y. We calculate the correlation between the two, or else run a regression of Y on X and other suitable variables. If the correlation between Y and X comes out significant, we conclude the X is a cause of Y. This methodology is seriously flawed. Some of the problems are the following:

  1. The correlation may be due to Reverse Causality: Y is a cause of X and not the other way around. Correlations do not allow us to distinguish the two cases.
  2. Common Cause:  If there is a variable Z such that Z causes Y and also causes X, this will lead to association between X and Y. However, in this case there is no direct causal relationship between X and Y.

There are many other causal structures which could lead to an appearance of correlation between X and Y without any causal connections between the two. Since discovery of causation is of central importance for policy, it is a puzzle why statistics and econometrics has such seriously deficient methodologies for the discovery of causes.

The reasons why Statistics is Blind to Causality emerge from a Long and Complex History. Chapter 2 of “The Book of Why” describes the bizarre story of how statistics inflicted causal blindness upon itself. This has to do with early mistakes and confusions by Galton, Pearson, Fisher, the founding fathers of modern statistics. In fact, these mistakes were due to the effects of the philosophy of Logical Positivism. According to this (mistaken) philosophy, scientific theories should not go beyond the observables. Since causes are never observable, prohibitions on discussing unobservables made it impossible to talk about causality. Loss of language and terminology for causation led to hopelessly bad methodologies, which are continue to be in use in mainstream textbooks, even though substantial progress has been made in the past few decades.

In this lecture, we will look at three different real-world case studies relating to the discovery of causes. The first of these is the LEAPS Survey, the second is about the discovery of the causes of Puerperal Fever, while the third shows the differences between randomized controlled trial and observational studies.

Case 1: Why are enrolments higher for male children compared to females, in rural Punjab?

World Bank study entitled: Learning and Educational Achievements in Punjab Schools (LEAPS): Insights to inform the education policy debate, published  February 20, 2007, authored by Tahir Andrabi, Jishnu Das, Asim Ijaz Khwaja, Tara Vishwanath, Tristan Zajonc

We will focus on just ONE insight into causes which was discovered by the LEAPS Survey. The Survey confirmed the commonplace observation that enrollments of female children in schools are LOWER than those of males. The question is:  What is the CAUSE of this differential? Given the culture and traditions of Pakistan, there is an obvious Hypothesis: “H1: Social Norms favor educating boys, and are against educating girls.” If this is true, then campaigns to change mindsets are needed. It is very important to note that enrollment data can tell us nothing about this. Traditional methodology in both econometrics and statistics would proceed by putting forth the hypothesis H1, noting the lower enrolment of girls, and taking this as a PROOF of the null hypothesis. As the survey shows, this is the wrong conclusion.  

One of the goals of the survey was to discover if Parents invest equally in education of all children, or whether they favor the brighter children. Survey questions lead to the result that parents invest the most money in education of brightest child – regardless of gender! Thus the initial Hypothesis H1 is false. The question remains: How to explain greater enrolment of male children? We can think up a New Hypothesis:”H2: Parents THINK male children are brighter than females”. If parents are biased in this way, then male enrolments would be higher, even though parents think themselves to be impartial between genders. However, the Survey Data Rejects this hypothesis H2 as well. The survey gathered three different sources of evaluation for children – test scores at schools, teachers evaluations, and parents evaluations of the children. All three were in good correspondence with each other. This was a bit SURPRISING, since often parents are IGNORANT about the subjects being taught. NONETHELESS, they can accurately evaluate schools and teachers who do well at educating their children. They are also able to judge the relative capabilities of different children accurately.

To make progress on discovering the causes, we need to distinguish between intentions and actions. Note that Positivists REJECT the distinction. The hidden intention is expressed by the action. However, conversations between the survey team and parents led to the realization that Parents often say we would like to educate our girl, who is brightest, but there are no schools nearby. This type of informal and qualitative evidence led to the realization that distance to the nearest school is a key variable. The study shows that parents are willing to send male children further away to schools, but are much less willing to send daughters to distant schools. This insight emerges from study of real world, and is clearly not part of original data.

Note how important finding the cause is for appropriate Policy. Without any deeper study, many have actually concluded that wrong social norms are the cause of the problem, and are accordingly working on campaigns to educate parents about the value of educating girls. According to the survey, parents already know this, and hence this is wasted effort. If hypothesis H2 is correct, then we need to inform parents about educational abilities of their children. However, this also is not need. The third hypothesis informs us that to increase female enrollment, we must lower distance to schools. This may involve building a lot more schools for females, and may not be feasible financially. Alternatively, we need to invest in providing secure transport to school for females, and work on alleviating anxieties of parents in this regard.

Case 2: Ignaz Semmelweiss discovers cause of Puerperal Fever

When IS arrived at the Vienna General Hospital in 1846, he learned that there were two essentially identical clinics A & B located next to each other, which had dramatically different rates of mortality. Both were maternity clinics meant for delivery of children. Women arriving were admitted to A or B on alternate days, so the two sets of patients should have been the same. The staff, methods of treatment, doctors, were the same at both clinics. Then why was Mortality of Women almost double in Clinic A? It is important to note that in mid 19th Century, there was no concept of germs. Underlying theories of disease had to do with Miasma: Bad Air, combined with weakness in patients, leads to disease. But, IS wondered, “How can there be a DIFFERENTIAL in Clinics A & B, with alternating admissions?” Both clinics shared similar environment, and so Miasma+Patient Weakness should lead to same outcomes in both clinics. Many other hypotheses about the possible cause of the difference were studied and rejected: Sunshine, Diet, Use of Hospital Linen, and others. None of these could explain the dramatic differential in mortality due to Puerperal Fever in females after delivery.

An accidental event led IS to formulate a new hypothesis about the cause of this differential. A colleague Kolletschka  was accidentally cut with a knife used in autopsy. He then developed symptoms like puerperal fever and died. This led IS to the conclusion that “cadaveric materials” – that is, some parts of the dead body (cadaver) – were transferred into the bloodstream of Kolletschka. Guessing that this is the cause led IS to examine how/why women in clinic A were exposed to possibility of infection from dead bodies, while those in clinic B were not. Once the hypothesis about the cause was formulated, it was easy to find a lot of confirming evidence for it. Clinic A was dedicated to training male medical students, while clinic B trained females to be midwives. Medical Students in clinic A routinely do autopsies on cadavers, then go on to examine pregnant females. Without germ theory, there was no emphasis on hygiene, and students would often examine females immediately after coming from an autopsy, without washing hands. In contrast, students training to be midwives had no contacts with cadavers, and did not do autopsies.

What seems obvious to us today, was revolutionary in 1850. The ideas that transfer of infectious materials from cadavers was contrary to accepted medical theories. Nonetheless, IS was able to find many Confirming Observations for his cadaveric theory of puerperal fever. He learnt that prior to 1823, the hospital director had been against the use of cadavers for medical training, and hence not autopsies were performing. After change of director in 1823, cadavers came into use for autopsies and medical training for students. At the same time, mortality rose from 125 to 500 per 100,000 in both clinics. The two clinics were separated into clinic A for male students and clinic B for midwives in 1840. It was only after 1840 that the differential appeared, where mortality rates in clinic A were twice as high as clinic B. Another evidence confirming the cadaveric hypothesis was that Puerperal fever occurred in rows in male Clinic; this was because male students entered and examine female patients in sequence, infecting them in rows. In contrast, in clinic B, cases occurred at random, here and there, without apparent sequencing.

Based on this diagnosis, Semmelweis came up with a simple solution: students should wash hands with Chloride of Lime after autopsy. This simple remedy dramatically reduced cases of puerperal fever in the clinic A, apparently confirming the validity of IS hypothesis about the cause. However, later developments showed that the cadaveric hypothesis was not quite right. In one case, even though all students washed their hands, all women in a ward with one female with cancer got infected with puerperal fever. The female was the first in line to be examined. This, and many other cases, showed IS that origins of disease were not caused only by cadavers. Accordingly, he came up with a Modified Hypothesis: “Decaying organic matter causes infections”. This is essentially correct and as close as you could get to the modern germ theory in the 19th Century. If the cadaveric hypothesis is true, then it would be enough for students to wash hands after autopsies or contact with cadavers. However, if the modified hypothesis is true, then students should wash hand prior to every examination. The second is the correct policy, in light of our modern knowledge of germ theory. Note how correct policy changes drastically as we learn more about the correct causes. Note also that even incorrect causal theories can improve policies. Note also the dramatic difference it makes to learn the true causes of disease in terms of saving lives.

Case 3: Difference between Randomized & Observational Studies

In the 1960’s, heart attacks emerged as a major cause of mortality in the USA. These appeared to be linked to cholesterol levels in the blood. Many experimental drugs to lower cholesterol and reduce heart attacks were developed. One of these drugs was CLOFIBRATE. The Coronary Drug Project was a randomized, controlled double-blind experiment, whose objective was to evaluate five drugs for the prevention of heart attacks. The subjects were middle-aged men with heart trouble. Of the 8,341 subjects,5,552 were assigned at random to the drug groups and 2,789 to the control group. The drugs and the placebo (lactose) were administered in identical capsules. The patients were followed for 5 years. The results of this trial are summarized below:

Does Clofibrate Save Lives? The table say NO. There are 21% deaths in control group and 20% deaths in treatment group. Currently there are 220 deaths in the treatment group and only 11 more would bring it up to 21%. Such a small difference could easily occur by chance. However, the doctors running the trial were puzzled by this result. This is because Clofibrate did lower cholesterol – why did it not lower the mortality from heart attacks? The experimenters thought that perhaps the mortality rates were high because some of the patients were not taking the drugs. They were able to divide the patients into two groups. Group A – the Adherers – took more than 80% of the drugs. Group NA – the Non-Adherers – took the drug less than 80% of the time. The table above shows that the adherers have only a 15% mortality rate, while the non-adherers have a 25% mortality. This is a significant difference. Can we conclude that clofibrate DOES make a difference, provided that you take the drug regularly? To answer this question, we must look at the treatment and control group in this NEW comparison.

Here Group A is the treatment group, the patients who took the Clofibrate regularly, more than 80% of the time. Group NA is the control group, which took the drugs irregularly, less than 80% of the time. These groups have NOT been randomly chosen, and so there is no guarantee that the two groups are matched. One obvious difference between groups A an NA is that regular taking of drugs indicates health-consciousness. Perhaps all member of group A are careful about their diet, exercise, as well as health. Since group NA did not take the drug regularly, perhaps they are not so health-conscious. Perhaps they are lax about exercise, diet and health in general. If this is so, Health-Consciousness is a Confounding Variable. The difference in the mortality rates between Adherers and Non-Adherers could be due to Health-Consciousnessk rather than the Clofibrate. How can we tell whether or not this is the case? One way is to look at the control group. The control group did not get the Clofibrate. Instead they got dummy pills which looked like Clofibrate, but they contained lactose, which has no effect on heart attacks. In the control group also there were adherers and non-adherers. If reduced mortality for Adherers is due to Clofibrate, then the Adherers who were taking dummy pills should NOT have lower mortality – they did not get the clofibrate. However, the table above shows that Groups A and NA have a huge difference in mortality rates of 15% to 28%. This difference cannot be due to clofibrate because no one in the control group got the drug. Therefore the difference between groups A and NA is caused by the health-consciousness, and NOT by the clofibrate.

The point of this discussion is that whereas we can trust that randomized experiments match the control and treatment groups, this cannot be assumed when the two groups are not randomly chosen. In the second case, which is called an observational study, there may be other differences, called confounding factors, which affect the outcomes. In the clofibrate trials, Health-Consciousness is a confounding variable. This shows us that we must probe deeper than surface observations, in order to learn about the hidden causes. We also learn that numbers can lie. If there was no control group, it would be very tempting to conclude that clofibrate really does work, provided that it is taken regularly. It is only the presence of the control group which shows us that it is the health-consciousness of the Adherers which leads to reduced mortality, and not the clofibrate drug. Commitment to false theories can lead us astray.

Concluding Remarks

So what do we learn from the examination of these real world case studies about the discovery of causes? We learn that causes do not lie on the surface. To discover them requires deeper examination of the real world. No amount of fancy data analysis can substitute for thinking about the hidden underlying mechanisms which generate the data. Observations tell us about associations, but not about causation. Associations provide us with CLUES to the hidden causal mechanisms, but one needs to exercises the deductive powers of Sherlock Holmes to deduce the causes from these clues. In particular, we see that Causation requires knowledge/theories about the UNDERLYING, and UNOBSERVABLE mechanisms by which the world operates.

This entry was posted in Uncategorized by Asad Zaman. Bookmark the permalink.

About Asad Zaman

BS Math MIT (1974), Ph.D. Econ Stanford (1978)] has taught at leading universities like Columbia, U. Penn., Johns Hopkins and Cal. Tech. Currently he is Vice Chancellor of Pakistan Institute of Development Economics. His textbook Statistical Foundations of Econometric Techniques (Academic Press, NY, 1996) is widely used in advanced graduate courses. His research on Islamic economics is widely cited, and has been highly influential in shaping the field. His publications in top ranked journals like Annals of Statistics, Journal of Econometrics, Econometric Theory, Journal of Labor Economics, etc. have more than a thousand citations as per Google Scholar.

Leave a comment