DSIA02D Goodhart’s Law states that measures used for policy become corrupted by this use. We discuss how this applies to college rankings. This is Part D of 2nd Lecture on Descriptive Statistics: An Islamic Approach. The 15m video is followed by 1300 word writeup.
To briefly review, we note that the essence of an Islamic Approach is to ask FOUR Questions:
- Why are we doing this analysis?
- What do the numbers mean?
- How were the numbers computed ?
- What is the potential IMPACT of this analysis?
Our primary focus in this lecture will be on the fourth question. It is worth noting that these questions break the barrier between theory and application. They bring in INTENTIONS, central to an Islamic approach. They analyze relationships between external observations and internal reality. They force statisticians to enter the real world of applications, instead of staying in sterile world of theory and numbers.
As background information, it is worth noting that this idea of looking at the numbers and NOT looking at the reality behind the numbers, comes from Logical Positivism, an immensely popular theory of knowledge which emerged in the 20th Century. The central idea of this philosophy was that we and only have knowledge about the observable facts – the hidden and unobserved reality is not part of scientific knowledge. A key strategy of Logical Positivism was to replace Unobservables by Observable Manifestations. For more information about this, see The Emergence of Logical Positivism.
As an example of this strategy, consider Economic Theory. There are three concepts: Welfare, Preference, Choice. Welfare refers to what is good for me (spinach), Preference refers to what I like (Ice Cream), and Choice refers to what I choose (Hamburger). Even though all three are distinct, only choice is observable. The effect of Logical Positivism on Economics, was to equate all the three:
WELFARE = PREFERENCE = CHOICE
This has BLINDED economists to the real sources of welfare, and created the confusion that everyone automatically knows what is best for her/him and DOES it. This equation leads to FREEDOM as being IDEAL. Everyone knows what is best for him and chooses it when the option is available. Behavioral economists have found that people frequently make bad choices, which are harmful for them. As a result, the NUDGE theory is based on the idea that we can guide people to make better choices in various ways which leave them free to choose, but make the best choice easier to find and make.
The numbers we measure, or Statistics, has a great IMPACT on real world. When we replace UNOBSERVABLE by OBSERVABLE & MEASURABLE Key Performance Indicators (KPIs), people also replace efforts on unobservables by efforts on observables. For example the genuine quality of research is unobservable. But we can get some idea about it by looking at the quantity (COUNT) and the reputation of the journals (Impact Factor). When this was done worldwide, it led to a massive increase in Fraud Journals, which publish articles for money and use many gimmicks to get impact factors. Instead of focusing on the unobservable reality, the focus shifted to the measurable numbers. Similar examples of how KPIs have led to attention to meaningless numbers, instead of the reality behind the numbers are available in all fields. For more illustrations, see Beyond Numbers and Material Rewards.
In every area of knowledge, the positivist attempt to replace unobservables by observable and measurable quantities has led to serious problems. For example, in The MISMEASURE of MAN, Stephen Gould talks about the problem of “Reification” – replacing the abstract by concrete. In particular, the IQ measure reduces the abstract, qualitative, multidimensional characteristic of intelligence to one number. This and other absurd ways of measuring intelligence (like shape of skull and brain size) have been taken seriously because of this tendency to replace the hidden unobservables by measurable manifestations.
With this as background, we come to the topic of our lecture. Goodhart’s Law states that “When a measure is used as a policy target, it ceases to be a good measure.” We OBSERVE something which is correlated with High Quality, and use it as a MEASURE of Quality. Awareness that it is being used as measure leads to change in behavior. For example, instead of trying to create quality of education, colleges focus on the indicators used by the reports and try to increase the numbers, leading to harmful results. For example, the college could rise in rankings by graduating everyone, regardless of academic performance. Trying to raise the ranking numbers leads to bad policies, and also DISTORTS the Numbers. Whe we replacing Qualitative Unobserved Target by Quantitative measures of the Target, this creates a SHIFT in the GOALS.
Next, Malcom Gladwell examines a specific measure: The REPUTATION Score which has a 22% weight in college rankings. He asks HOW is it computed? This is done by a Survey of High School and College Officials. But do the people surveyed HAVE information required to rank colleges? Critical Question. Most people know very little about the 200+ colleges in the survey and cannot provide any useful information about this matter. Research Studies of Reputation Surveys show that they produce good results IF experts are asked about their area of expertise. Otherwise, they just replicate UNINFORMED public consensus. To prove this point, MG discusses two examples.
One is the analysis of “Best Hospitals” Rankings produced by asking doctors to rank hospitals in their area. A researcher took measures of hospital quality based on objective factors like mortality rates, type of equipment, staffing, etc. and found that objective measures of quality had zero correlation with the reputation ranking. This is simply because typical doctors do not know much about hospitals other than the one they work in. Similarly, Lawyers were asked to rank Best Law Schools. It was found that they ranked Penn State very highly. BUT Penn State does not have a law school! This illustrates the level of ignorance about law schools together with the effect of general reputation in public perception of the school.
So how do people rank colleges when they know nothing about how to compare different colleges? They turn to public sources of information about this matter – that is, the US News and World Report College Rankings. So it is the Rankings which Drive the Reputation Score! This is an Illustration of Goodhart Law. Reputation is based on the ranking – but the ranking gives the highest weight 22% to reputation. To illustrate the different between informed and uninformed rankings, MG mentions rankings of colleges done by corporate recruiters. Because these people take graduates and place them at jobs and follow their progress, they have knowledge about the quality of graduates being produced by different universities. Their rankings are very different from the US News and World Report Rankings, For instance they rank Penn State at the top, even though this does not come within the top 20 in the USNWR rankings.
Goodhart’s Law illustrates how our observations change the world. When we measure things, our measures acquire importance. Hidden Quality is signaled by some markers, such as Ph.D. faculty, small classes, selectivity in admissions, and high graduation rates. However, if we focus policy on improving Markers of quality, instead of on quality, this leads to major mistakes. A college could rise in rankings by hiring more Ph.D.’s, increasing selectivity in admissions, and graduating all students it admits. But none of these policies will actually have any direct impact on quality. Indeed some policies which target the indicators could actually be harmful for quality. Attempts to target the indicators will DISTORT the indicators as markers of quality. AFTER publication count became a factor in evaluating faculty research, many faculty acquired hundreds of publications in just a few years by various shady techniques, so that publication count was no longer a good marker of quality.
What this reveals is that Ranks reflect Implicit Ideological Judgments. Factors chosen and factors excluded, as well as weights attached represent values. However, logical positivism teaches that values are not scientific knowledge, so values are never explicitly included in analysis. Instead, they are concealed in choice of factors and weights, which creates an impression of objectivity. This is what makes statistics so dangerous – it covers ideological value judgements with a pretence of objectivity created by numbers