Histograms with Varying Bin Sizes

bit.ly/DSIA03E  – Part E of Lecture 3 in Descriptive Statistics: An Islamic Approach. Lecture examines effects of varying bin sizes on histograms

Preliminary Remarks: Mistaking the Map for the Territory

In order to be able to understand (simplify & reduce) data – it is useful to construct a statistical model for it. If the data follow a theoretical distribution, then they can be described by a formula. The DISTRIBUTION of the data may be identified with a theoretical distribution (like the Normal). If this is true, that allows us to substantially reduce the data set, since Normal distributions are completely characterized by only two numbers: the mean and the standard deviation.

A good way to identify the Data distribution is to look at the HISTOGRAM – a picture of the data. But, as we will see in this lecture, there are many possible histograms for the data, depending on the bin size. A traditional  question is: What is the BEST model for the data? In the current context, what is the best bin size for making a histogram. This is the WRONG question. Data is primary, models are secondary. Different types of models describe different aspects of the data. As we decrease the bin size, we get a more refined picture of the data.  At each level of refinement – histograms illuminate different aspects of the data. There is no one BEST bin size. We will illustrate this general concept by examining the histogram for Life Expectancies for 190 countries in the WDI data set for the year 2018.

We start by looking at the Default Histogram for 2018 Life Expectancy for 190 countries in WDI. The Histogram goes from MIN=52.8 to MAX=85.0 and makes 7 bins of equal size, where Bin Size = 4.6 years.

LEB7

Starting with this as a baseline we will examine the effects of making the bin-size smaller or larger. In general, if bin size is too large, all data goes into one bin and details are lost. On the other hand, if bin size is too small, every bin contains only one or zero data points and the groupings in data are not VISIBLE from the graph. The above 7 bins is a compromise between these two opposing effects, as we will soon see.

We start with the Coarsest Histogram with One Bin Only:

LEB1

From this, we learn the RANGE of the data: it varies from MIN=52.805 to MAX=84.934. This is a COUNT histogram. We learn that there are 190 countries in the data set from the vertical axis. Later we will study  a PERCENTAGE or PROBABILITY histogram. This gives us the proportion of the population in a given bin. From a probability histogram, we would not learn the count, since only 100% would appear on the vertical axis.

Next let us look at a histogram with only two bins:

LEB2

Two Bins divide the range from 52.8 to 84.9 into two equal parts. The midpoint of range is 68.8.  55 countries are in first bin of below midpoint Life Expectancy, while 135 countries in 2nd bin. Clearly, the distribution is NOT symmetric. From this graph it is obvious that the Normal distribution would NOT be the right model for this data set.

The 3 Bin Histogram divides countries into three categories – high, middle and low Life Expectancy. The Low LE Bin goes from 52.8 to 63.5, and has only 27 countries. The middle LE bin goes from 63.5 to 74.2, and has 71 countries. The high LE bin goes from 74.2 to 84.9, and has 92 countries:

LEB3

What is very surprising is that the largest number of countries are in the highest category. The MODE is the bin (or category) which has the largest number of categories. The graph shows that the Mode is at last bin. WHY is this very surprising? That will become clear if we look at the histogram of these same 190 countries classified by GNP per capita in the same year 2018. This is graphed below

GNP3Bins

This 3 bin Histogram of GNP per capita, in PPP terms, constant USD, show that only a few countries belong the the high GNP category, and the vast majority belong to the low GNP category.  This shows that EVEN countries in bottom third income category can achieve high life expectancies for population. This means that cheap and simple measures sufficient for substantially and significantly lowering mortality rates. One does not need to wait to grow rich as a country, in order to take effective measures to improve the health of the population.

The 4 Bin Histogram divides the range into four categories, with Bin Width = 8 years:

LEB4

In this histogram, the Modal Bin is [68.8, 76.9] with 75 countries. There are only 59 countries in highest bin going from 76.9 to 84.9. The graph suggests that it is relatively easy to get LE upto 70, much harder to get it upto 80. To learn more about this, we need to look at the mortality rates in each age group. By comparing between countries with low and high mortality rates, we can learn about where is the greatest potential for improvement. To realize this potential, we need to investigate carefully the causal determinants of mortality.

As we go through graphs of 5, 6, 7, 8, and 10 bins, we get more information about how the data divides into different kinds of groupings. At each level of refinement we get more information about the data, and we also pick up some visual patterns not visible at other levels of refinement. However, as we increase the level of refinement, we start losing the ability to look at the graph and interpret it directly and visually. Here is the histogram with 20 bins:

LEB20

There are FOUR modes in this histogram. When the number of countries in a bin is small, countries can fall into a bin or out of it by statistically accident. When you have two bins, High and Low, classifications would be robust to small errors – regardless of how you compute it, the classifications would remain the same for most countries. However, when you make up a large number of categories, this is no longer true, and classification can be much affected by small errors in the data. Thus the number of countries displayed in the graph is NOISY – it is much affected by errors. As we make the bins even smaller, the noise increases even  more and the patterns in the data are no longer visible.

With 200 bins, it is very hard to see any of the patterns in the data that were easily visible when the bin sizes were smaller:

LEB200

 

There is a paradox here. Technically, all the data in the coarse bins is actually contained in the refined bin. It is just that our eyes do not process this kind of information well, we cannot convert the picture to the patterns visible in the histograms with larger bin size. Good bin sizes balance objective information with our subjective capabilities to process information. The default bin-size chosen by EXCEL gives a fairly good picture of the data. Note that information in histogram becomes visible AFTER we make the graph, so choice of “optimal” bin size is impossible. The choice of bin size gives us a Distribution which provides a MODEL for data, but there is no TRUE model. All models are approximations to enable us to summarize the data and understand it.

Deeper understanding requires examination of mortality rates and their causal determinants. This requires going further, beyond the data sets, into examining mortality rates, classifying them by type, and examining causes of each type. Numbers gives us clues about the real world, but are never the goal of the analysis,. Statistical analysis must be followed up by examining real world issues that they highlight.

LINKS TO RELATED MATERIALS

Lecture 1: Distinguishing features of an Islamic Approach to Statistics – In four parts: bit.ly/dsia01a , b, c, d,

Lecture 2: Comparing Numbers: Comparing multidimensional qualities necessarily involves values, and hence most rankings are subjective, not objective measures of external reality. In six parts: bit.ly/dsia02a, b, c, d, e, f

Lecture 3 (Current Lecture) on Life Expectancies Part A explains the Life Expectancy is a one-dimensional numerical measure, and hence objective. Part B described how LE is computed in detail, and what these numbers mean. Part C makes a start on analyzing World Bank WDI data for 190 countries from 1960 to 2017 on Life Expectancies. Part D constructs, analyzes and interprets HISTOGRAMS for this data set. This part E analyzes effects of changing bin size on Histograms Shortlinks are bit.ly/dsia03a, b, c, d, e

 

 

 

 

 

 

 

 

 

Histograms of Life Expectancy Data

Bit.ly/DSIA03D: Part D of Lecture 3 on Descriptive Statistics: An Islamic Approach – deals with reading and understanding Histograms of the World Bank Life Expectancies data set.

Some Historical Background

How do we analyze a large data set? Annual LE data for 190 countries 1960-2017 consists of 57 x 190 is more than 10,000 points of data?

Classical Answer: Find an approximate theoretical model for it. Fit data to model, then analyze properties of the model. This “reduces” the data by replacing it with a theoretical model with a few parameters, and makes it easy to understand and analyze. But WHY use this method, which now appears natural to statisticians? BECAUSE Computational Capabilities to do the right kind of analysis DID NOT EXIST!!

Currently, Statistics is caught in a theory trap – Computational Capabilities required for good data analysis have come into existence – though only recently. Pedagogy has NOT caught up to this. We are still teaching statistics as if computers do not exist.

What is the GOAL of statistical data analysis?

  1. FIRST: to UNDERSTAND the data set – what do the 190 x 57 LE numbers tell us?
  2. SECOND: to trace implications of this message for reality

PROBLEM – Our minds are NOT equipped to understand raw data, a table of 190 x 57 numbers. We cannot see patterns in this data. We cannot even find minimum maximum, or compare and evaluate countries, using the TABLE directly.

SOLUTION – FIND a way to represent the data which makes SENSE to our MIND. We are looking for a GUI – a Graphical User Interface – to access the data. Prior to Computers, theoretical models of the data were the ONLY possible approach to reduce the data. HUGE amount of work exists on HOW to fit theoretical models to data. Conventional statistics is ALL ABOUT theoretical models for data. What we are calling an Islamic Approach is based on CHANGING the goals – instead of creating a theoretical model to represent the data in a form that is “easier” to understand, we try to  DIRECTLY understand the data ITSELF!

  1. One of the best tools is the HISTOGRAM.
  2. Categorize the data, and look at numbers in each category.

For example, 0-10 children, 11-20 teenagers, 20-40 Younger workforce, 40-60 older workforce, 60+ retired. Many other kinds of categories are possible.  A PICTURE of the data which splits data into categories, and COUNTS the number of points in each category, is called a HISTOGRAM. In this lecture, we will MAKE, and interpret and analyze, histograms of the life expectancy data.

The MAKING part is easy because EXCEL now has a built-in Histogram Graph type. This was not available in previous versions, and it was quite clumsy and difficult to make histograms before 2016. Just highlight a column (or row) of numbers, and click on the Histogram graph type to make a histogram. Below is a Histogram of Life Expectancy in 1960. This puts data for 190 countries into 6 bins from 25-35 in ten year categories going up to 85. This graph is given below. It shows 12 countries falling into the 25-35 category, and more than 40 each in each of the 4 categories 35+, 45+, 55+, and 65+. There seems to be one country in the 75+ but this is an illusion. In order to force EXCEL to keep the range the same for all the graphs, I added 25 and 85 as life expectancies to all the graphs. This makes the range of all the graphs the same – from MIN=25 to MAX=85.

LE1960

The entire data set is available via link to: WBLifeExpect. Students can examine the spreadsheet and actually use online EXCEL to replicate the histogram above. The first time you create the histogram, the bin-size will be set to default value which is not 10. You must click on the x-axis, and click on Format Axis on the menu which appears. Then select Bin Width and set it to 10 to replicate the graph above. An Google Spreadsheet with the data is embedded below, and can be used to sort the data according to the desired column. Click on the sheet, and then on the desired column to sort by.

The columns are 1960, 1970, 1980, 1990, 2010, and 2015. These are the data columns which are graphed and analyzed in the histograms below. To better understand the LE data, note the top 10 countries and the bottom 10 countries. Also note the rank of your own country and the LE. Also note the rankings and LEs of Pakistan, Iran, Egypt, Turkey, China, India, Bangladesh, Morocco. Keep track of these as we move forward from 1960 in the graph above.

The second histogram is for 1970, and shows clearly how life expectancies have\increased. The bins in the highest category [65,75] has the most countries in it. This is the MODAL bin – the one which has the largest number of countries. This increase corresponds to a decrease in the smallest two bins. Countries are all going up in Life Expectancy.

LE1970

Again you can examine the data in the Google Spreadsheet  attached below, which has been sorted by the 2nd column, consisting of Life Expectancy in 1970. What are the new entrants in the top 10, and which countries did they displace? What are the changes in the bottom 10? How did the ranking and life expectancies change in the particular countries we are following across time?

The next graph provides a histogram for the 1980s. We now see that countries have life expectancies in 75-85 average range. Furthermore, the category of 65-75 is by far the biggest category. This is the modal category, with more than 80 countries in it.For the first time, two countries have gone above the 75 average and climbed into the 75-85 category, which was previously empty.

LE1980

The Google Spreadsheet attached below sorts the countries according to LIfe Expectancy in the 1980s. Note the changes in the top 10 and the bottom 10. Also find the rankings of the particular countries we are following across time, and note any reversals in rankings – countries which were low in ranking moving up, and changing their positions. Think about WHY this could have happened, and note that the answer to this question will NOT be found in the data set. This is an essential aspect of “Real Statistics” – an alternative name for the Islamic Approach. The numbers are pointers to reality, and not the object of study. The observations provide us with clues about important aspects of reality, and further study in depth of the reality itself is required to follow up on these clues, and learn from the numbers.

Our next histogram is for 1990, and is displayed below

 

LE1990

A Google Spreadsheet with the data sorted by Life Expectancy in 1990 is attached below. Use this sheet to study the top 10 and bottom 10, and also note the relative rankings of India and Pakistan, as well as other countries we are following across time. What is happening to the Life Expectancy in the bottom countries? Why do we observe this phenomena?

Next look at the histogram for the year 2000 plotted below:

LE2000

Again the Google Spreadsheet linked below provides the rankings of the countries in 2000. This can be used to study the top 10, bottom 10, and the rankings of the particular countries we are following. There are many surprises in this data set. In particular the strong performance of Greece is surprising. What accounts for the success of Greece in increasing life expectancies, despite serious economic difficulties being faced by the country? To answer this, we need have much more detailed information about Greece, and social policies which affected the mortalities in that country. Note that increase in Life Expectancy must be due to decrease in current mortalities in various age categories. How did the number of deaths go down, as a percentage of the population? Similarly note the behavior of top 10 and bottom 10, as well as the other countries, and note any surprising reversals, as phenomena to explore further by studying the real world in greater depth.

Our last and final histogram is for Life Expectancies in 2015. For the first time, the category [75-85] is now the MODAL category, with the largest number of countries belonging to this category. Note that this category started out with ZERO countries in it in the 1960s. This shows that overall Life Expectancy has increased substantially over this period from 1960 to 2105, across the globe. The bottom two categories are now empty, and the lowest [45,55] has a tiny number of countries in it.

LE2015

The Google Spreadsheet embedded below provides the list of countries ranking in order from highest LE to lowest LE, according to the year 2015. This can be used to study the changes in rankings at the top and bottom, as well as for any of the 190 countries in the list.

Concluding Remarks: The father of Western Statistics, Sir Roland Fisher, defined the subject as being the reduction of data. The key is to find “sufficient statistics” – a small number of ‘statistics’ which summarize the data. This is done by imposing theoretical assumptions on the data. For example, if we assume that the data is Normal, then the work of Fisher shows that two numbers – the mean and the standard deviation – carry the information contained in the entire data set. The point of summarizing is that we do not have the mental capacities to look at 10,000 numbers in the data set and understand them directly. However, today, with computers, an alternative approach is possible. Instead of imposing theoretical assumptions on the data, we can GRAPH the data in various ways to get a picture which we can understand. The object of Descriptive Statistics is to DIRECTLY visualize and understand the data. These pictures are to be used to get clues about the nature of the deeper reality which is manifested in the numbers. The Life Expectancy is based on current mortality rates, and increase in LE correspond to reductions in the mortality rates. To study WHY Life Expectancies have increased dramatically, we must study the causes for decline in mortality rates across the globe.

LINKS TO RELATED MATERIALS:

Lecture 1: Distinguishing features of an Islamic Approach to Statistics – In four parts: bit.ly/dsia01a , b, c, d,

Lecture 2: Comparing Numbers:Discusses how index numbers calculated for rankings merge facts and values, and therefore DO NOT measure aspects of external reality – The rank assigned to beauty is in the eyes of the beholder, subjective and not purely objective. In six parts: bit.ly/dsia02a, b, c, d, e, f

Lecture 3 (Current Lecture) onLife Expectancies Part A explains the Life Expectancy is a one-dimensional numerical measure, and hence an objective and quantitative feature of external reality, not a mixture of facts and values like other numbers. Part B described how LE is computed in detail, and what these numbers mean. Part C makes a start on analyzing World Bank WDI data for 190 countries from 1960 to 2017 on Life Expectancies. This part D constructs, analyzes and interprets HISTOGRAMS for this data set. Shortlinks are bit.ly/dsia03a, b, c, d

Analyzing World Bank Data on Life Expectancy

[bit.ly/dsia03c] Part C of Lecture 3 on Descriptive Statistics: An Islamic Approach, starts the analysis of the WDI (World Development Indicators) data set of the World Bank, which covers 190 countries from 1960 to 2018. The goals of our analysis are different from the goals of conventional analysis. Sir Ronald Fisher, also known as the father of statistics, defined statistics to be the reduction of data. In contrast, an Islamic approach seeks to produce useful knowledge, instead of playing games with numbers. Thus, we aim to use the numbers to learn about the real world. For this purpose, it is important to choose numbers which convey information about the real world. Life Expectancy is one such number, which is a piece of objective information about living conditions in any given country. This is different from data like GNP per capita, which is a mixture of subjective opinions about what is wealth, and what are relevant and irrelevant factors to be considered for this purpose, with genuine data about the real world. Because of this mixture of facts and values, many numbers reflect the mindset of the creator of the number, rather than a fact about external reality. For conventional statistics, it does not matter what kind of numbers we have, and how they are related to external reality. Summarizing the numbers involves the same set of operations. But for an Islamic approach, there is a huge difference, because numbers which reflect subjective mindsets will not help us to learn about the external reality. Video linked below provides the first steps towards learning about this life expectancy data set, and what it tells us about the real world.

 

One of the important lessons conveyed in this lecture is the importance of benchmarks for comparison. If we went to assess performance of any country, we have to COMPARE it with something else — that something else is a “benchmark” for performance. We start by creating three benchmarks. the Minimum, the Maximum, and the Median. By using these three numbers we can look at any country and place it in the bottom half or the top half of the countries with respect to Life Expectancy. We can also judge how close it is from the top and bottom rankings to get an idea of its position among all the 190 countries for which the data has been tabulated. Such comparisons allow us to judge performance, as to whether it has been good, bad, or average.  Whenever we say the X is good or bad, we are making a COMPARISON of performance of X with something else. For clarity, it is essential to state what X is being compared with, to evaluate its performance. Choosing benchmarks, and justifying them is one of the essential aspect of the rhetoric of statistics, the methodology by which numbers are used to persuade.

 

Computing Life Expectancy from Mortality Tables

[bit.ly/DSIA03B} In this Part B of 3rd Lecture on Descriptive Statistics: An Islamic Approach, we explain in detail how life expectancies are computed using a hypothetical example.

A Mortality Table

To compute life expectancies, we start by constructing a mortality table. This is given below:

Age Group Number in 2017 Mortality Rates
0-9.999 30M 6M 20%
10-19.999 40M 10M 25%
20-29.999 25M 15M 60%
30-39.999 15M 10M 66.67%
40-49.999 5M 5M 100%

In each of the 5 categories of age groups, we ASSUME population size as listed in the table – M stands for Million. So there are 30,40,25,15, and 5 Million people in age categories of 0-10, 10-20, 20-30, 30-40, and 40-50 respectively. By collecting data on deaths by age category in 2017, we can find out the number of deaths in each age category. This number (hypothetical) is listed in the third column. The last column gives the mortality rate in each age category, which is just the percentage of deaths within that age category. This is called a mortality table.

From Mortality to Life Expectancy

Age Group Live at Beginning Mortality Rate Died during period Total Lifespan
0-9.999 1000 20% 200 1000
10-19.999 800 25% 200 3000
20-29.999 600 60% 360 9000
30-39.999 240 67% 160 5600
40-49.999 80 100% 80 3600
SUM= 18,600

To compute Life Expectancy at birth, we do a thought experiment. Imagine a batch of 1000 people who are all born on Jan 1, 2018 – what will their average age be? This is the life expectancy. We use the mortality table to GUESS that 20% of these people will die within the 0-10 category – that is, 200 people will die before reaching their 10th birthday. Now we ask: how long these people live, cumulatively? We make the assumption that these 200 deaths are EQUALLY distributed over the 10 years, so that there are 20 deaths each year. With this assumption, the average age of all 200 people will be 5 years, the midpoint of the 0-10 age category. This means that the total lifespan of these 200 people who died will be 200 x 5 = 1000 person-years. This gives the numbers in the first line of the table.

For the second line, we start with the 800 survivors of our batch of 1000 newborns, all of whom have birthday Jan 1, 2018. It is now Jan 1, 2018, and ten years have passed. 200 people in this batch died, and 800 are still alive on this date. How many will survive from 2018 to 2028, and come into the 20-30 age category? The mortality rate for 10-20 year olds that we see in 2017 is 25%, and this is our best guess for what might happen in the future. Applying this rate, we guess that 25% of these 800 people will die, leading to 200 deaths. How long did these 200 people live, cumulatively? Assuming equal distribution of deaths over the ten years, the average age of this group would be 15 years, the midpoint of the 10-20 category. This gives us 3000 person-years which is 15 years times 200 persons, as the total lifespan of all 200.  This gives us the numbers in the second line of the table, for the age group 10-20.

We can go on to the third line and complete it in the same way. 600 people will be alive in Jan 1, 2038 and enter into the 20-30 age category. The mortality rate in this category in 2017 is 60%. Applying this rate to the future, we assume that 60% of these 600 people will die before Jan 1, 2048, while in the 20-30 age category. The cumulative age of the 360 people who will die is 360 x 25 = 9000 person-years.

Continuing in this way, 240 people will be alive on Jan 1, 2048, and will make it into the 30-40 age category. A mortality rate of 66.67% for this age category in 2017 leads us to estimate that 160 of these people will die before reaching Jan 1, 2058. The average age of these 160 will be 35, so the cumulative life experience of this batch will be 160 x 35 = 5600. Only 80 people will make it to Jan 1, 2068, entering the 40-50 age category. We assume 100% mortality in this age group, so all of them will die. The cumulative life experience of this last batch will be 80 x 45 = 3600.

Now the life expectancy can be computed by adding up all of these lifespans of all of the 1000 people. This sums to a total of 18,600 person-years. If each of the 1000 people lived exactly 18.6 years, then the sum total would also be 18,600, so 18.6 is the average age of these 1000 people. This is the life expectancy.

The Median or Half-Life

Instead of the average lifespan, we can also use the median lifespan as a measure of the life expectancy. The median lifespan for these 1000 people is the date on which 500 people die, while 500 people remain alive. This can easily be computed from the previous table. In the first two age categories, from 0-10 and 10-20, we have 400 deaths. In the next category of 20-30, we have 360 deaths, so that the cumulative deaths at the end of this period are 760. Obviously, the 500th death will occur within this age category. We know that 600 people entered this age category of 20-30 on Jan 1, 2038. We know that 360 of them will die in the next decade, between Jan 1, 2038 and Jan 1, 2048. The median age is the date on which the 500th person dies. 400 people have already died earlier in the previous age categories. So the 500th death will be the 100th death in the current batch. A total of 360 deaths will take place over the ten years. So the 100th death will take place at a time which 100/360 proportion of the 10 year period, assuming deaths are evenly spaced over this decade. This leads to 2.777 yrs = (100/360) * 10 yrs.  Thus the median lifespan the 22.777 years, which is the predicted age of the 500th person to die in this batch.

Refinements: Equal Deaths vs Equal Probabilities

The above calculations are just a rough first pass attempt at explaining the details of life expectancy calculations, according to the “period life expectancy” method. There are other methods, such as the “cohort” method, which can also be used. A discussion of life expectancy numbers and what they mean is given in “Life Expectancy” – What does this actually mean? by  Esteban Ortiz-Ospina on World Bank website, on August 28, 2017. Regardless of how we make the computations, life expectancy depends on making extrapolations about the future based on what we have observed in the past. The future in inherently uncertain, and hence these projections never be made with great accuracy. It is best to accept the uncertainty, and use simple methods, rather than use fancy methods to create an illusion of sophistication and accuracy which cannot be achieved when guessing about the future.

There is one refinement which can sometimes be important, and is worth explaining and clarifying. Our projections, as calculated above, are based on the assumption that if there are 200 deaths over 10 years, then these are equally distributed over the 10 years, so that there are 20 deaths each year. For the 0-10 age category, we can detail this as follows:

Age Survivors Deaths Rate
0-1 1000 20 2.00%
1-2 980 20 2.04%
2-3 960 20 2.08%
3-4 940 20 2.13%
4-5 920 20 2.17%
5-6 900 20 2.22%
6-7 880 20 2.27%
7-8 860 20 2.33%
8-9 840 20 2.38%
9-10 820 20 2.44%
200

Equalizing the number of deaths leads to a rising probability of death. Alternatively, it would be possible to equalize the probability of death, which would make the number of deaths decrease. It is not possible to prefer one method or the other on theoretical grounds. The best solution is to go to the data to actually get the death rates by 1 year categories and avoid theoretical approximations. However, the reason for mentioning this is that it sometimes the highest mortality occurs in the 0-1 age category. Once people survive to one year, then the mortality goes down dramatically. In such cases, the assumption of equal deaths, or equal probability of death, can both be misleading.

Our goal in this lecture is not to make students experts in demography, and in learning the methods of computation of life expectancy. Rather, we are providing enough details to ensure that the students are comfortable with the concept, and understand how it is calculated and what it means. It is important to note that even though the life expectancy is a guess about the future, it is based solidly on current data. This means that it reflects the realities of mortality today, and gives us useful information about population health today, wrapped into one number.

Links to Related Materials: Each lecture is broken into parts named A, B, C, D, etc. Shortlinks like bit.ly/dsia02b point to part b of second lecture. There are four parts of the first lecture, which explains the basis of an Islamic approach, and why it is necessary.

Lecture 1: A summary and description of the four posts is available from: Descriptive Statistics: An Islamic Approach: The four posts have the following titles: Descriptive Statistics: Islamic Approach(bit.ly/dsia01a), Comparing Numbers (bit.ly/dsia01b),Eastern & Western Knowledge (bit.ly/dsia01c), How to Teach & Learn: Islamic Principles (bit.ly/dsia01d),

Lecture 2: Rankings: Mixtures of Facts and Values: Comparing Numbers (bit.ly/dsia02a), Arbitrariness of Rankings (bit.ly/dsia02b),, What do College Rankings Measure? (bit.ly/dsia02c), Goodhart’s Law (bit.ly/dsia02d), Values Embodied in Factors & Weights (bit.ly/dsia02e), Corruption Rankings (bit.ly/dsia02f)

Lecture 3: Life Expectancies – Previous Post: Life Expectancies (bit.ly/dsia03a), THIS is the current post: bit.ly/dsia03b

Life Expectancies

[bit.ly/dsia03a] Part A of 3rd lecture on Descriptive Statistics: An Islamic Approach. This is an Introduction to the Basic Concepts regarding Life Expectancy which will be the topic of this lecture. The 15m video is followed by a 1500 word writeup.

 

Many Kinds of Numbers

It is widely believed that “you can’t argue with the numbers”. Contrary to popular belief, there are many kinds of Numbers:

  1. Qualitative and Unmeasurable things: Corruption, Love, Intelligence, Courage, Sympathy, etc. One can measure manifestations, like number of articles for quality of research. Such Numbers can be harmful by directing attention away from reality towards the manifestation.
  2. Quantitative, Multidimensional: Material Wealth, Body Size – these are things which can be measured, but there are many different measurements required. When multiple numerical measurements are reduced to a single number, this always involves subjective choices of factors and weights. In such cases, numbers are mixtures of the subjective and the objective, combinations of facts and values. These can be useful if purpose is clear, and values are explicit.
  3. One Dimensional Quantitative: These numbers are of special importance. They provide objective measures of features of external reality.

Essential Aspect of Islamic Approach

When we ask the four questions about why, how, meaning and impact of numbers, it is impossible to be talking about numbers in the abstract. These questions can ONLY be answered when we are talking about numbers used in a particular real world context. Thus, we reject the separation of theory and practice, and argue that theory can only be understood within the context of real world applications.

In this lecture, we will discuss numbers used to measure Life Expectancy (LE). LE is a useful measure of a real and important aspect of our personal lives. The amount of time that remains for us to live is of vital importance to us in making our plans for the future, and for determining how we live in the now.

Central Questions

Why are we studying Life Expectancy?  Unlike GNP and other measures which are mixtures of subjective and objective, LE provides an objective ranking of countries with respect to one dimension of health.

What do the Life Expectancy numbers mean? Roughly speaking, this is the average amount of time that a person can expect to live.

How are LE numbers computed? This aspect will be discussed in detail later.

What kinds of impact can studying these numbers have on the real world? Increases in LE often correspond to increases in health and nutrition for the general population. To understand how this matters, we refer to a recently published article: “Life Expectancy and Mortality Rates in the United States, 1959-2017” by Woolf & Schoomaker. This article states that, after increasing continuously from 1959 to 2014, from 69.9 to 80, LE has declined for three straight years in a row! WHY has life expectancy started DECREASING in the USA, contrary to global patterns? The study traces the source of the decline to increases in middle age mortality. Middle aged people in the USA have increasing mortality rates due drinking, drugs, and depression leading to suicides.  Since Life Expectancy is increasing around the globe, and also increasing in OTHER age groups in the USA, these statistics signal something extremely wrong with LIFE in middle age in USA. The statistics point to an aspect of real world which is worth studying further in greater detail.

Using Life Expectancy as proxy for Health

While assessing “health” is difficult because it is qualitative, LE provides a useful quantitative numerical proxy. Using this proxy, we can ask questions like the following:

  1. Which countries have had rapid increases?
  2. Which countries are performing well?
  3. Which are doing poorly, in terms of providing satisfactory health and nutrition to their people?

Life Expectancy statistics are available and provide information on these issues.

What does life expectancy MEAN?

LE is defined as “The average age which a newborn baby EXPECTS to live”. This definition does not make sense – everyone will live to some particular FIXED age – how can we take an AVERAGE over this?

SOLUTION: Take 1000 people. For each person calculate AGE, to get A(1), A(2), …, A(1000). Now take the average. Sum of all is TOTAL age lived by the entire population of newborns, divide by total number of newborns. This is the average age of all of the 1000 people born today.  We explain how this is done by a Hypothetical Example of Calculation of Life Expectancy:

Of 1000 people 850 survive to 10 years, 150 die in 0-10 category. For these 150, combined total age is 150 x 5 = 750 years. This is under the rough assumption that all of them lived for exactly 5 years, the midpoint of the category of 0-10 years.

From the 850 people alive in the 10-20 category, 500 survive to 20 years. 350 died at average age 15. The total age of these 350 is 350 x 15 = 5250 person-years

Of the 500 people left alive in the 20-30 category, only 100 survive to 30 years. 400 died at average age 25. The total ages of people dying in this category is 400 x 25 = 10,000 person-years.

Finally, of the 100 people left alive in the 30-40 age category, all 100 of them die within this range. This gives us 100 people with average age 35, for a total age of 100 x 35 = 3500 person-years

TOTAL Life in person years of all 1000 is 750+5250+10,000+3500 = 19500. Divide this by 1000 to get Life expectancy of 19.5. This is the average age of the entire population of 1000 people. There is another statistic which is also useful for measuring the “average”. This is called the MEDIAN, or the Half-Life in context of atomic particles. This is explained below

MEDIAN Life Expectancy,  also called Half-Life

In the above example, 150 died at <10, 350 died at <20, 400 at <30, and 100 at <40. So half of the population 500 died at <20. The other half, 500 died at >20. Thus age 20 divides the population into half. Half die at age less then 20, and exactly half are alive beyond the age of 20. Thus the MEDIAN Life expectancy, or the half-life, of this population of 1000 people, is 20. Another way to say this is to ask, what the life expectancy of a person chosen at random from this population of 1000 people. For the randomly chosen person, there is a 50% chance of death before 20 and a 50% chance of death after 20. This is another way to define the median life expectancy, or the half-life. Note that this is close to 19.5 AVERAGE, but somewhat different in meaning and in method of calculation.

How to CALCULATE Life Expectancy?

The life expectancy requires information that we don’t have. For a NEWBORN batch of 1000 people, we need to PREDICT what % will die before 10, what % before 20, what % before 30, etc. BUT these numbers are unknown. We don’t know how many of the newborns will die before their 10th birthday. The standard method to make these predictions is to use CURRENT mortality rate. That is, look at the percentage of the 0-10 population which died in the CURRENT year, and use that to forecast the mortality of the newborns.  Similarly, we can find the mortality rate of the current population of 10-20 year olds, by looking at what percentage of people in this age range died in the current year. Then we can use this percentage as a forecast for what will happen to newborns who reach the age category of 10-20 after 10 years. A “mortality table” gives us percentages of deaths in each age category:

Look at 2019 data, How many people aged 10-19 died? What is the total population in age range 10-19. This percentage is MORTALITY rate for 10-19.

Life Expectancy is calculated by using current mortality rates to predict future mortality.

Conclusions

The standard methods for ranking countries uses GNP per capita. However this number is a mixture of subjective values which lead to choice of factors and weights which go into the computation of this number. The WDI tables give at least 7 variants of this and many more could be devised. These numbers are arbitrary because they are based on a mixture of facts about the countries, and the values which go into selecting the facts to highlight, and the weights used to prioritize them.

In contract, Life Expectancy comparisons are objective indicators of a fact about external reality. This is because we are computing a ONE-DIMENSIONAL measure of a quantitative and numerical fact about the country. Anyone who tries to compute this number, using any reasonable method, will come up with similar numbers. This number is a feature of the country, and not a projection of my ideas about how to evaluate countries. At the same time, the life expectancy is not a perfect measure of the target number we are trying to estimate – the average life of newborns.

This is because Life Expectancy depends on approximating future mortality rates by current mortality rates – this approximation may fail to be valid for many possible reasons – future death rates are unknown. LE is a reasonable GUESS about life expectancy.

Corruption Rankings

bit.ly/DSIA02F – Part F of Lecture 2 on Descriptive Statistics: An Islamic Approach discusses Corruption Rankings made by Transparency International.

In this lecture, we will examine global corruption rankings in light of The Four Questions which are central to the Islamic Approach:

  1. WHY are we doing corruption rankings of countries?
  2. What do the numbers mean?
  3. How are they calculated?
  4. What is the impact of the creation of these corruption rankings?

This lecture is based on Zaman, Asad and Rahim, Faiz, Corruption: Measuring the Unmeasurable Humanomics, Vol. 25, No. 2, pp. 117-126, June 2009. https://ssrn.com/abstract=1309131

We start by thinking about “Why we assign NUMBERS to corruption?” After all, it is qualitative condition of the heart, not subject to measurement. There is a long and complicated story which led to the attempts to measure the unmeasurable, which we summarize very briefly, to explain this:

  1. A battle between Science and Religion fought in Europe led to rejection of Christianity, and acceptance of Science as the new religion of the West.
  2. It became widely believed that Science is the only source of reliable knowledge. This led to rejection of heart, emotions, subjectivity. Logical positivists introduced the Fact/Value distinction, and said science was about facts, while values were not scientific.
  3. Advances in Physics were tied t accuracy of measurement. This led to the misconception known as Lord Kelvin’s Dictum: If you cannot measure it, you don’t know what you are talking about. Numbers = Knowledge. See Lord Kelvin’s Blunder.
  4. In the early 20th Century, Social Sciences were constructed by application of Scientific Method. But the methodology of science was VASTLY misunderstood by Logical Positivists and it was this misunderstanding of science that was used to create methodology for economics, econometrics, and statistics.
  5. These developments, where knowledge required measurement, led to attempts to Measure the UnMeasurable throughout the social sciences.

Can Corruption be Measured? Obviously, the internal Qualitative, corruption of hearts, cannot be measured in numbers. BUT External Manifestation, like Bribes, can be measured. It is worthwhile to define BRIBE as the Use of Money for Persuasion towards personally profitable agenda at social cost.

Even if we confine attention to bribes, corruption is multidimensional and cannot be reduced to a single number. To see this, compare two countries. A has 100 corrupt transaction of $ 1M each. B has 1M corrupt transactions of $100. WHICH country is MORE corrupt, A or B? There is NO OBJECTIVE answer to this question. To answer, we need to specify the purpose of making the comparison.

There are situations when it become necessary to try to measure the unmeasurable. In such situations, the following Rules for Measurement are worth remembering.

The simplest case occurs when the Target is ONE-DIMENSIONAL and Quantitative. IN this case ONLY, objective measurement is possible.  Much more often we have the case of a qualitative and multidimensional phenomena. In this case, we should explain clearly the subjective choices required to convert qualitative & multidimensional measures into a single number. If we consider a range of options, and also the purpose for which different USERS may find it useful, we will find different numbers for different users. This would be helpful to dispel the image of objectivity created by statistics.

Now, we come to the topic of the lecture. How is the CPI (The Corruption Perception Index) computed by Transparency International? To the best of our knowledge, they poll a group of wealthy businessmen of unknown identity , and ask them to rank countries from 1 to 10 in terms of their perceptions of corruption in a given country. High numbers are high honesty and integrity, while low numbers correspond to high corruption.

As discussed earlier in “What do College Rankings Measure?”, the crucial question is: “How much KNOWLEDGE do they have of global corruption, and of RELATIVE corruption?” Uninformed rankings just report the prejudices of the people who are doing the ranking. There are many reasons to suspect that these rankings are done by foreigners with little knowledge of local culture. Furthermore, it is likely that these businessmen make brief visits to get big jobs done in the fastest way made possible by wealth – they look for corrupt counterparts to avoid the regular process. In any case, it is likely that the perceptions just reflect the prejudices of those doing the ranking, rather than any characteristic of the country.

What do the CPI numbers mean? Statistical analysis in Zaman and Rahim (2009) shows that the CPI has a 98% correlation with log (GNP per capita). In other words, Integrity and Honesty is just another name for wealth. This likely reflects prejudice of the wealthy. In real life, we see that More wealth = more greed & corruption. The Quran also mentions how excess wealth leads to corruption. Remember that Corruption is a two party transaction.The poor accept money to do favors for the rich – the poor get the blame and coverage, while the wealthy escape attracting attention.

If we use the definition of bribe given above, LOBBYING in the USA, is easily seen to be bribery: the use of money to pursue narrow group interests while inflicting huge costs on society. The Global Financial Crisis is one example of how rich financiers got trillions of dollars in bailouts, at the expense of poor mortgagers made homeless by the millions. Another egregious example is the Medicare Prescription Drug Bill passed in 2003 using dirty tactics by   Senator Tauzin on behalf of Big Pharma. The bill ensures that the Pharma industry can charge whatever price they like for sales to medicare. The government cannot negotiate, and they cannot import cheaper alternatives from Canada. The bill has been called an $80 billion per annum give-away to the Pharmaceutical Industry. (Despite a campaign promise to do so, Obama was unable to get this bill repealed due to the powerful Big Pharma Lobby.)  Afterwards, Senator Tauzin left Congress to take up a $2 million consultancy, and also received more that $11 milion in cash rewards from grateful Big Pharma. But while all of this is documented, none of  this is counted as corruption!

Our Islamic approach requires us to dig deeper into the historical context and background of the numbers we analyze. Why CPI was developed? The answer is somewhat complex. In the Post WW2 era, there was a competition between Capitalist & Communist models of development.  The World Bank offered the Structural Adjustment Program, as a roadmap for development. There is not a single instance of success – no country became developed by following World Bank advice, but many countries, like East Asia, did industrialize by REJECTING World Bank advice (see Choosing our own pathways to progress). This failure of capitalist model widely documented and acknowledged by all parties. In order to maintain credibility, it was necessary for the World Bank to find some scapegoat to blame for the failure of the capitalist model for development. This was done by putting the blame on the poor countries for their own failure – a standard illustration of blaming the victim. It was not bad models created by the World Bank which led to failure, but bad governance and corruption in the poor countries which caused the failure. For more details see the article on Michael Foucault Power/Knowledge which explains how the powerful shape knowledge for their benefit.

The fourth question is: “What is the IMPACT of CPI?:. We could imagine that, theoretically, a country with a high CPI will make efforts to improve in terms of governance and corruption. Practically, it has the opposite effect. Solid research establishes that my behavior is affected by my PERCEPTION of social norms (and not by the REALITY). So If PERCEPTION of high corruption is created, people will act in more corrupt ways. If PERCEPTION of justice and low corruption is created, people act honestly. This means that the strategy of moving towards greater integrity and honesty is the opposite of the one currently being followed all over the developing world. Institutions like NAB and Anti-Corruption drives highlight corruption and cause it to spread.  Instead, an effective strategy would highlight honesty and integrity. If a country has 99 incidents of corruption and one of integrity, publicity for the solitary good incident would create an impression of honesty and help to spread it. Thus, attempts to measure corruption via CPI are likely to be counter-productive rather than helpful in combating corruption.

Conclusions: The Colonization of Globe was justified by Racist arguments. White man was infinitely superior to all other races, and had right to rule the world.  The colonization process was so extremely brutal and ruthless, that records have been suppressed from history and memory. Today, this process of colonization continues by financial means. Poor countries make billions of dollars of interest payments to the rich. Justification for this exploitation of the poor by the rich and powerful is still needed. This justification is created by the CPI as well as many types of economic theories of development.

POSTSCRIPT: This concludes Lecture 2 – Previous 5 posts discussed how widely used College Rankings are arbitrary. By choosing factors and weights appropriately, we could make the rankings come out in any desired way. Links to this sequence of posts are: Comparing NumbersArbitrariness of Rankings , What do College Rankings Measure? , Goodhart’s Law, and Values Embodied in Factors & Weights

Values Embodied in Factors & Weights

Lecture (Bit.ly/DSIA02E) is the fifth Part of Second Lecture on Descriptive Statistics: An Islamic Approach. The 14m video is followed by a 1600 word writeup.

How were the seven factors that enter US News & World Report (USN&WR) College Rankings chosen? We could think of a lot of important factors related to college education which are not part of the ranking factors. To answer this question, we must realize that choice of factors represents values. According to dominant popular subjective valuations, Harvard, Yale, and other elite universities have high rank and status in public perception. This Intuitive assessment of quality comes first. When creating a ranking, we try to choose factors which which will MATCH our pre-existing intuition. That is, we already know in advance of creating rankings, which colleges should come out on top. We choose factors to support this intuition. When this method for choosing factors according to pre-existing prejudices is concealed, an illusion of objectivity is created. Concealment of Values involved in choice of weights and factors, and of role of intuition in statistical models, is one of the important aspects of deception.

To illustrate how choice of factors is based on values, Malcom Gladwell notes that the Price of Education is not included, even though, for most students, this is the most important aspect of education. The authors of the survey offer no reason for this. They say that it is “Just our subjective judgment”. But a deeper reason could be that putting in price of education could go against “intuition”.  The “best” universities in popular public image are also the most expensive ones. MG asks “What happens if we include price?” He shows that some  relatively unknown universities appear in the top TEN. What does this mean? It means that some universities can provide a great education at a very low price. However a ranking which put a lowly university at the top would create distrust in the ranking, which may be a reason for avoiding it.

To show how arbitrary the whole business of ranking is, MG refers to an online “Rankings Game” created by Professor Slater. He has collected Data on many different characteristics of Law Schools used in rankings. You can assign weights and watch the rankings change. Once you understand how the game works, you can create almost any ranking you like, and you can make any university come out on top.

Going outside the MG article, we can illustrate how public perception of power shapes the measures used for ranking. The dominant power gets to make the rules about what is measured. Before World War 1, “Brittania rules the waves”, and the measure of power was defined by Sea Power, Coal Mines, and others which favored Great Britain. After the war destroyed European economies, the USA emerged as the dominant economy on the Globe. In 1934, Simon Kuznets introduces the concept of the GDP, according to which US was the world leader. Later, when some tiny Oil Economies got higher GDP/Capita, the idea was adusted to include a reasonable Income Distribution. That is, if a few families in the nation are very rich, that does not make the nation the richest in the world. Even later, some European economies like Switzerland overtook the USA in GNP per capita. The measure of national wealth was re-adjusted to include Infrastructure and natural resources, where USA has a huge lead over Europe. The point is that the dominant power has the ability to dictate which factors should be used to rank nations. Currently, depending on which criteria are chosen, US or China could come out on top, reflecting the shifting balances of international power. If I choose criteria, I can make Pakistan come out on top. I would choose suicide, crime rates, Number of people who live in stable families with both parents, psychiatric patients, drugs, alcohol, percentage of population in jail, etc. Large numbers of factors which accurately reflect human lives could be used to make Pakistan come out ahead of the USA. But what does this, or any other ranking, MEAN? This question is of central importance, and not part of conventional statistics.

MG answers that Ranks are Implicit Ideological Judgments. The choice of Factors represents Value Judgments. However, positivism teaches us that values are not scientific facts. This is why conventional statistics conceals values contained in numbers. MG provides another illustration of values by discussing two factors which are opposed to each other. One of these is Selectivity: What percentage are admitted? High Selectivity automatically leads to low ranking on Graduation Rates – This requires some explanation regarding Graduation rate. The Graduation rate is not the percentage who graduate, because that would create a bias against colleges who admit poor students. Graduation Rate is the IMPROVEMENT achieved by the university over the graduation rate that its students would have in general. To explain this better, FIRST calculate EXPECTED rate for ADMITTED STUDENTS – Yale admits superstar students who would have 98% graduation rate. How much can Yale IMPROVE this rate? At most, it can achieve 100% which would give it 2 percentage points on this factor. As opposed to this, a college which admits poor students who have a 50% chance of graduation, and achieves 80% graduation rates, can get a score of 30%. It turns out that the low ranked Penn State does a great job on this factor.

Given that these two factors work against each other, how should we rank Selectivity vs Graduation Rate? There is no right answer. USN&WR makes Selectivity twice as important as the Graduation Rate. This just reflects a personal preference for Yale over Penn State. Students selecting colleges may prefer low selectivity, as it maximizes their chances of getting into the college. Governments wanting to fund education may prefer large public universities which admit everybody and do the best possible job on the worst students. The choice of weights, and the rankings, depend on the purpose for which it is done.

Rankings are not objective; rather they are ideologies masquerading as objective numbers  Impact of Ideology. What is the IMPACT of these ideologies? The college rankings are no longer a harmless game that we play with numbers. These rankings have massive impact on public mindset, funding, choices made by students, professors, salaries. It is a matter of great importance that these rankings ignore the price. This means the colleges do not have an incentive to provide good education at lowest possible price, because this would not affect the ranking. The bias in favor of the wealthy is shown by the fact that the top 20 schools are always the private elite class schools. Most of the ranking factors relate strongly to WEALTH. Heavy endowments of private universities make them impossible to compete with. Even though Penn State is Most Popular university – 115,000 application – but there is no way they can get into top 20, without Billions of dollars

Malcom Gladwell concludes that Rankings reflect mindset of Ranker. He gives the example of a professor Huntington who took a survey of his colleague asking them to rank civilizations around the globe. There is no surprise that the survey ranked USA and UK at the top, given that all respondents were familiar with these civilizations, and had no knowledge of others.

Our own concluding remarks for this second lecture are as follows. There is a famous saying that “Statistics are the eyes of the State”. Factors which are measured get attention, while aspects of society which are not measured tend not to receive attention in public policy. In particular, the most important numbers are the GDP, where there is a concerted effort by Ministries around the globe to improve rankings in GDP. But the target of the efforts is the NUMBER and not the reality behind the number. The Bureau of Statistics can do many kinds of manipulation to increase GDP and increase growth rates, without having any effect on the lives of the people. Focusing on NUMBERS leads to harmful policies, while focusing on the REALITY that numbers are meant to measure would improve policies. This point is made forcefully in a book by Stiglitz with the title: “Mismeasuring our Lives: Why GNP does not add up.”. He shows how essential aspects of our lives are not measured in GNP, and this leads to very poor economic policies which cause a lot of harm in the dimensions which are not measured, while improving the measured dimensions.

This problem does not relate to GDP or College Rankings alone. Rather, we use numbers to measure performance in many dimensions, and these same problems arise in nearly all the rankings that we use. An Islamic Approach to statistics requires us to do two things in this context. One is to make the value judgments in choice of factors and weights explicit and aligned with Islamic values. The second is to focus on the Reality behind the numbers, and not the numbers by themselves.

 

Links to Related Materials for This Lecture: DSIA02E

This lecture: Values Embodied in Factors & Weights: bit.ly/dsia02e

Beyond Numbers and Material Rewards.

First Four Parts of Lecture 2: bit.ly/dsia02a,dsia02b,dsia02c,dsia02d

  1. Comparing Numbers: We need to look at WHY we are comparing, and WHAT do the numbers MEAN?
  2. Arbitrary Rankings: Numbers used for ranking are creating by mixing subjective weights with objective characteristics.
  3. What Do College Rankings Measure? What Do the Numbers Mean? How are they computed?
  4. Goodhart’s Law: How USE of rankings to measure quality causes distortions when Universities follow policies to rise in ranking.

Goodhart’s Law

DSIA02D Goodhart’s Law states that measures used for policy become corrupted by this use. We discuss how this applies to college rankings.  This is Part D of 2nd Lecture on Descriptive Statistics: An Islamic Approach. The 15m video is followed by 1300 word writeup.

To briefly review, we note that the essence of an Islamic Approach is to ask FOUR Questions:

  1. Why are we doing this analysis?
  2. What do the numbers mean?
  3. How were the numbers computed ?
  4. What is the potential IMPACT of this analysis?

Our primary focus in this lecture will be on the fourth question. It is worth noting that these questions break the barrier between theory and application. They bring in INTENTIONS, central to an Islamic approach. They analyze relationships between external observations and internal reality. They force statisticians to enter the real world of applications, instead of staying in sterile world of theory and numbers.

As background information, it is worth noting that this idea of looking at the numbers and NOT looking at the reality behind the numbers, comes from Logical Positivism, an immensely popular theory of knowledge which emerged in the 20th Century. The central idea of this philosophy was that we and only have knowledge about the observable facts – the hidden and unobserved reality is not part of scientific knowledge. A key strategy of Logical Positivism was to replace Unobservables by Observable Manifestations. For more information about this, see The Emergence of Logical Positivism.

As an example of this strategy, consider Economic Theory. There are three concepts: Welfare, Preference, Choice. Welfare refers to what is good for me (spinach), Preference refers to what I like (Ice Cream), and Choice refers to what I choose (Hamburger). Even though all three are distinct, only choice is observable. The effect of Logical Positivism on Economics, was to equate all the three:

WELFARE = PREFERENCE = CHOICE

This has BLINDED economists to the real sources of welfare, and created the confusion that everyone automatically knows what is best for her/him and DOES it. This equation leads to FREEDOM as being IDEAL. Everyone knows what is best for him and chooses it when the option is available. Behavioral economists have found that people frequently make bad choices, which are harmful for them. As a result, the NUDGE theory is based on the idea that we can guide people to make better choices in various ways which leave them free to choose, but make the best choice easier to find and make.

The numbers we measure, or Statistics, has a great IMPACT on real world. When we replace UNOBSERVABLE by OBSERVABLE & MEASURABLE Key Performance Indicators (KPIs), people also replace efforts on unobservables by efforts on observables. For example the genuine quality of research is unobservable. But we can get some idea about it by looking at  the quantity (COUNT) and the reputation of the journals (Impact Factor). When this was done worldwide, it led to a massive increase in Fraud Journals, which publish articles for money and use many gimmicks to get impact factors. Instead of focusing on the unobservable reality, the focus shifted to the measurable numbers. Similar examples of how KPIs have led to attention to meaningless numbers, instead of the reality behind the numbers are available in all fields. For more illustrations, see Beyond Numbers and Material Rewards.

In every area of knowledge, the positivist attempt to replace unobservables by observable and measurable quantities has led to serious problems. For example, in The MISMEASURE of MAN, Stephen Gould talks about the problem of “Reification” – replacing the abstract by concrete. In particular, the IQ measure reduces the abstract, qualitative, multidimensional characteristic of intelligence to one number. This and other absurd ways of measuring intelligence (like shape of skull and brain size) have been taken seriously because of this tendency to replace the hidden unobservables by measurable manifestations.

With this as background, we come to the topic of our lecture. Goodhart’s Law states that “When a measure is used as a policy target, it ceases to be a good measure.”  We OBSERVE something which is correlated with High Quality, and use it as a MEASURE of Quality.  Awareness that it is being used as measure leads to change in behavior. For example, instead of trying to create quality of education, colleges focus on the indicators used by the reports and try to increase the numbers, leading to harmful results. For example, the college could rise in rankings by graduating everyone, regardless of academic performance. Trying to raise the ranking numbers leads to bad policies, and also DISTORTS the Numbers. Whe we replacing Qualitative Unobserved Target by Quantitative measures of the Target, this creates a  SHIFT in the GOALS.

Next, Malcom Gladwell examines a specific measure: The REPUTATION Score which has a 22% weight in college rankings. He asks HOW is it computed? This is done by a Survey of High School and College Officials. But do the people surveyed HAVE information required to rank colleges? Critical Question. Most people know very little about the 200+ colleges in the survey and cannot provide any useful information about this matter. Research Studies of Reputation Surveys show that they produce good results IF experts are asked about their area of expertise. Otherwise, they just replicate UNINFORMED public consensus. To prove this point, MG discusses two examples.

One is the analysis of “Best Hospitals” Rankings produced by asking doctors to rank hospitals in their area. A researcher took measures of hospital quality based on objective factors like mortality rates, type of equipment, staffing, etc. and found that objective measures of quality had zero correlation with the reputation ranking.  This is simply because typical doctors do not know much about hospitals other than the one they work in. Similarly, Lawyers were asked to rank Best Law Schools. It was found that they ranked Penn State very highly. BUT Penn State does not have a law school! This illustrates the level of ignorance about law schools together with the effect of general reputation in public perception of the school.

So how do people rank colleges when they know nothing about how to compare different colleges? They turn to public sources of information about this matter – that is, the US News and World Report College Rankings. So it is the Rankings which Drive the Reputation Score! This is an Illustration of Goodhart Law. Reputation is based on the ranking – but the ranking gives the highest weight 22% to reputation. To illustrate the different between informed and uninformed rankings, MG mentions rankings of colleges done by corporate recruiters. Because these people take graduates and place them at jobs and follow their progress, they have knowledge about the quality of graduates being produced by different universities. Their rankings are very different from the US News and World Report Rankings, For instance they rank Penn State at the top, even though this does not come within the top 20 in the USNWR rankings.

Goodhart’s Law illustrates how our observations change the world. When we measure things, our measures acquire importance. Hidden Quality is signaled by some markers, such as Ph.D. faculty, small classes, selectivity in admissions, and high graduation rates. However, if we focus policy on improving Markers of quality, instead of on quality, this leads to major mistakes. A college could rise in rankings by hiring more Ph.D.’s, increasing selectivity in admissions, and graduating all students it admits. But none of these policies will actually have any direct impact on quality. Indeed some policies which target the indicators could actually be harmful for quality. Attempts to target the indicators will DISTORT the indicators as markers of quality. AFTER publication count became a factor in evaluating faculty research, many faculty acquired hundreds of publications in just a few years by various shady techniques, so that publication count was no longer a good marker of quality.

What this reveals is that Ranks reflect Implicit Ideological Judgments. Factors chosen and factors excluded, as well as weights attached represent values. However, logical positivism teaches that values are not scientific knowledge, so values are never explicitly included in analysis. Instead, they are concealed in choice of factors and weights, which creates an impression of objectivity. This is what makes statistics so dangerous – it covers ideological value judgements with a pretence of objectivity created by numbers

What do College Rankings Measure?

[bit.ly/dsia02c] Part C of 2nd Lecture on Descriptive Statistics: An Islamic Approach [DSIA L02C]. We continue our study of Malcolm Gladwell’s (MG) article on ‘College Rankings’.  We will consider the questions of “How are the Numbers Computed” and “What do they Mean?” The 15m video is followed by a 1400 word writeup.

MG starts by noting that the Purpose and Audience for College rankings have changed over time. It was Initially meant as a rough guide for “consumers” (students choosing colleges).  It was not imagined that Colleges would use these rankings as benchmarks of performance, proof of good management, status markers in the rivalry among colleges. It was not imagined that educational policies would be used to engineer a rise in ranking, As we have discussed, changing purposes require changes in measurements, and the rankings have NOT been changed to suit the changes in purposes, with harmful results.

Going into the methodology of the ranking itself, it is based on seven major factors:

  1. Undergraduate academic reputation, 22.5 per cent
  2. Graduation and freshman retention rates, 20 per cent
  3. Faculty resources, 20 per cent
  4. Student selectivity, 15 per cent
  5. Financial resources, 10 per cent
  6. Graduation rate performance, 7.5 per cent
  7. Alumni giving, 5 per cent

MG registers Two Major Complaints about the ranking process:

  1. Universities being ranked are extremely diverse (heterogenous) – How can you compare apples and oranges?
  2. Each university is extremely complex and multidimensional – dozens of departments, campuses, programs – how can a SINGLE number be assigned to this?

I will explain that there are many other issues worth considering about this ranking later on. For the moment, we note that many, many questions arise from this description of the ranking process:

  1. Why these SEVEN factors? Why not others? What is the basis for selection?
  2. Why these weights?
  3. WHAT do these factors MEASURE?
  4. How are the factor scores computed?

We start the discussion by asking a simple question: Does a number actually measure what it is supposed to measure – that is, Are Numbers Accurate? To explain this, MG considers the example of the Suicide Rate. Here the Target IS Measurable – that is, there really is a NUMBER which measures the number of people who committed suicide in the past year. But no one knows what that number is. The statistics which are available are distorted by many biases. It is extremely difficult to guess intentions of a dead person. Someone classifies a death as a suicide, and this person varies greatly by country. Depending on culture and customs, classification could be done by police officer, family, doctors. Whether or not it is reported on official statistics is again a separate matter. Because of this diversity, it would be a hopeless task to compare suicide rates across countries with any degree of confidence. INSTEAD, one should ask the PURPOSE of the comparison. If quality of life is the target, then more direct measures based on surveys of welfare may give better results.

In addition to criticisms by MG, I would like to focus on the issue that when we look at a number, it is essential to be clear about the TARGET – WHAT is that number trying to measure? So, when I am given a number measuring the Quality of a College, I must ask “What do you MEAN by Quality of College?” One way to specify this quality (and there are many other possible definitions) is to consider student learning: Student ENTERS with knowledge and skills, and EXITs with MORE knowledge, skills. The DIFFERENCE between the two is the Educational Outcome, what the college contributed to the learning process of the student. Of course, this is a Multi-Dimensional quantity – learning and skills occur on many different dimensions which are not comparable with each other. It is hard to reduce multidimensional performance to a single number unless some clear and specific purpose of education is specified. For example, if we consider how well the education provides medical skills in terms of ability to treat patients, it might be possible to come up with a single number which aggregates the contribution of all dimensions to the single purpose. This is a complex issue, which will arise in many different contexts.

A SECOND ISSUE of importance, in terms of IMPROVING how we do statistics: From VAGUE & IMPRECISE measures of INPUTS, move to measures of OUTPUT. Stiglitz-Sen-Fitoussi recommended moving to consumption, which directly measures what human beings get, instead of production, which measures goods produced that COULD potentially get to the consumers. To illustrate this idea, consider one of the seven factors used in the rankings: Faculty Resources. According to the reasoning given for this factor, Student Engagement with faculty is an important part of the educational process. Instead of directly measuring Student Engagement (which is vague and qualitative, and hard to define and measure), we use PROXY measures, which are INPUTS which go into producing Student Engagement. These proxies are:

  1. Class Size
  2. Faculty Salary
  3. Proportion with Ph.D.
  4. Student Faculty Ratio
  5. Proportion of Full-Time Faculty.

It is true that these factors all have the POTENTIAL to create a better student educational experience. These are INPUTS into the educational process. But how effective are they? Do they actually achieve this potential? Do these factors really matter?

Suppose we specify the TARGET of our quality measure as before: “How much students “GROW” in the educational process?”. MG cites Educational Research by Terenzini & Pascarella Meta-Study of 2600 papers, which finds NO RELATIONSHIP between student engagement AND the standard list of variables used in nearly all methods for measurement of quality of colleges:  educational expenditures per student, student/faculty ratios, faculty salaries, percentage of faculty with the highest degree in their field, faculty research productivity, size of the library, [or] admissions selectivity

If these INPUTS do not matter, than what DOES matter? It turns out that the key variables are the ones which are qualitative and non-measurable, or SOFT Variables: Educators engage students when they are: purposeful, inclusive, empowering, ethical, and process-oriented. For a summary of take-aways from the Terenzini and Pascarella in-depth studies, see: Pathways to Success: Student Engagement.

Focus on what can be measured takes attention away from the important qualitative factors, which often cannot be reduced to numbers. For an example of this (not discussed in the MG article) consider the question of “Do SAT scores predict academic success?”. There is a huge Controversy about the issue  but the facts are clear. SAT is solidly correlated with First Year Performance. Correlation weakens with time. BUT the effects is very small. For practical purposes, it is reasonable to conclude that we should NOT use SAT for college admissions. WHY? Again the key factors which lead to success are not measurable. Research shows that Student Characteristics strongly correlated with Success are Drive, Motivation and Perseverance. These are character traits which are not measured by SAT’s. Another way to think about this question is to ask:  “Can we take students with low SAT’s and turn them into Super Performers?” The  answer is YES, and there is a lot of evidence that teachers who motivate & inspire can take students from any background and turn them into star performers.

Concluding Remarks: It is helpful to look at the bigger picture. The rise of Logical Positivism in 20th Century led to an extreme emphasis on the observable and measurable and a complete neglect of the qualitative and unmeasurable aspects of our lives. This has led to the drive to MEASURE everything. But the Most Important Things in life are not measurable. We live our lives without measuring in numbers those things which matter the most to us – loving and being loved. This ability to deal with qualitative and unmeasurable phenomena needs to be extended to the bigger world of education and management.  Even when complex, multidimensional phenomena ARE measurable, multiple measures CANNOT be reduced to one number. False philosophies lead us to PRETEND that a single number can MEASURE the quality of colleges. This type of confusion arises from failure to think clearly about the  TARGET – What is being measured and Why?  To improve statistical analysis, we must learn to think clearly about the bigger questions, instead of confining attention to the numbers alone.

Arbitrariness of Rankings

[bit.ly/dsia02b] This is part B of 2nd lecture in “Descriptive Statistics: An Islamic Approach” (DSIA02b) considers the issue of comparing two numbers to decide which is higher. Even though this task is trivial from the statistical point of view, it is very complex when we follow through to try to understand the real world context in which the numbers are being compared. This is illustrated through an example involving ranking of cars.

One of the best sources of learning is reading articles and books. Good articles and books encapsulate deep wisdom, which authors have gathered from their life experiences. Ultimately, the only source of knowledge is life experience itself. Since we have only one life to live, we can only gather a small amount for ourselves. Reading gives us access to the fruits of the life experiences of millions of scholars, throughout the centuries of written works. It is essential to be selective in this reading. This is because the amount of false and misleading information is vastly greater than that which is useful and relevant. Furthermore, even the useful and relevant material is so extensive that we will only have a chance to read a very small portion of this in our entire lifetime. One of the tasks of a good teacher is to provide guidance in this regard. Having read thousands of articles, select the few that stand out for students. If the teacher can point the student to one article that summarizes the wisdom of 1000, he has not only guided the student to a useful article, he has also saved the student the time required to read the other 999, and arrive at judgments of their relative worth.

One of the best articles which explains the meaning of comparing numbers in the context of college rankings is the following:  Gladwell, Malcolm. “The order of things. What college rankings really tell us” The New Yorker 87.1 (2011): 68-75. Downloadable copy: Gladwell Rankings PDF.

In this lecture, we will read the article together. I will provide some simple and clear explanations of what is being said, so as to enable the student to read the original article. Although the article is about college rankings, it starts by illustrating the ranking problem in the context of cars. The goal of the article is to show that all rankings are deceptive – it is just one of the ways to “lie with statistics”. Even though the Car and Driver magazine comes up with a clear winner in their rankings, the winning car is NOT the best in any clear sense of the word. In fact, the question itself is meaningless. It is impossible to rank cars without consider the PURPOSE of the ranking. In this lecture, I will provide a simplified and detailed explanation of the material in the article, to enable students to read and understand the article itself.

Suppose that there are three dimensions along which cars are evaluated – Appearance, Engine, Price. Let us put aside the issue of how we come up with numbers for the subjective categories, even though this is also important. Let us suppose a panel of experts can judge, on a scale of 1 to 10, the objective rankings of cars on these three dimensions. The first concerns external appearance, style, attractiveness. The second concerns engine performance judged by many different criteria. We have omitted one of the criteria used in the Gladwell article: “the subjective feel of driving,” which has to do with how the car handles when it is driven in different situations. We have replaced this by the price, which can be evaluated objectively. Here is a set of hypothetical numbers which evaluates three cars along these three dimensions.

Car Name Appearance Engine Price
Porsche 6 9 3
Lotus 8 7 6
Chevrolet 5 5 9

 

Note that high numbers mean high ranking, so the ranking of 5 given to Chevrolet means that it is the cheapest car, having the best price among the three cars being evaluated.

Note that each of the three cars is best in one of the three dimensions. Lotus is best in appearance, Porsche has the best engine, while Chevrolet has the best price. How can we find out which is the best car overall? The CORRECT answer to this question is that we CANNOT do this. The ranking between the cars depends on the PURPOSE for the evaluation – WHY are we trying to rank the cars. Without specifying a purpose, we cannot rank the cars. The standard methodology in use is deceptive – another illustration of “How to Lie with Statistics”. It assigns weights to all three factors to come up with a combined score. Let us look at how this is done. I will use C&D to be a hypothetical version of the Car & Driver magazine which is discussed in the actual article. The statements below about C&D correspond only roughly to Gladwell’s article, and are meant to simplify a more complex discussion. With this warning, we consider how C&D comes up with a ranking of cars, even though this is impossible to do without considering purpose of ranking.

C&D editors feel that what is inside the car, the engine, is the most important factor. They assign it a weight of 50%. Because they are car enthusiasts, they find that a sleek and stylish appearance is very important, and the price is not so important. So, they assign a weight of 40% to appearance, and 10% to price. Once these weights have been assigned, the score for each car can easily be calculated. Multiplying by 10 to avoid decimals, we find that, with these weights, Lotus gets 73, Porsche gets 72, while Chevrolet gets only 54. The message from this ranking is the Lotus and Porsche are close to each other and both are distinctly superior to Chevrolet. The numbers create and OBJECTIVE feel – this is not a matter of personal tastes of the C&D editors, but and objective evaluation of the characteristics of the cars.

This message is completely wrong. The rankings are created as a MIXTURE of subjective weights and objective characteristics of the cars. To bring this out, Malcolm Gladwell argues as follows. He says that Car and Drivers editors used the SAME weights for this evaluation that they do for SUV’s (Sports Utility Vehicles). Now SUV’s combine elements of practicality with a sporty feeling, but the cars being evaluated are high class luxury cars. He says that the typical buyer of sporty luxury cars is a lot more interested in the APPEARANCE of the cars, as compared to what is inside the engine. These cars are brought for show. If we change the weights to 50% on appearance and 40% on the engine, with price still at 10%, then Lotus emerges as a clear winner. The scores are now: Lotus 74, Porsche 69, and Chevrolet 54. Putting even more weight on appearance would put Lotus even further in the lead.

Next consider a buyer who has a modest income, but great love of luxury. He would be very happy to buy a sports luxury car, if only he could afford one. As long as the car is classified as a luxury car, it is all the same to him. He is maximally concerned with the price. If we put weight of 50% on the price, and 25% each on Appearance and Engine, the Chevrolet will emerge as the winner with 70, while Lotus and Porsche lag behind with 67.5 and 52.5 respectively.

So depending on the tastes of the buyer, and the purpose for which the car is being bought, the ranking would be different. Malcolm Gladwell explains that there are two situations in which it is possible to come up with an objective ranking. One situation is when we focus on only one factor. If we look only at price, or at power of engine, or at appearance, then we can evaluate two cars X and Y and decide if X is better than Y in appearance or not.

According to Malcolm Gladwell, the second case in which objective rankings can be done is if all of the cars are similar to each other on the dimensions being ranked. He thinks that it is the diversity of the cars being ranked that leads to the sensitivity of the ranking to weights. This is a mistake. Even if the cars are similar to each other – homogenous, in Gladwell’s terminology – the problem of sensitivity to weights will remain exactly the same as in a heterogenous group. According to Gladwell, the problem arises because Car and Driver tries to cover the field and rate a very diverse group of cars. This is not true.

The source of the failure lies in the failure to specify the PURPOSE for which the ranking is being done. When we explain WHY we want to rank the cars, then we can correctly specify the weights for the different factors. The purpose is subjective – it depends on the person who is buying the car. For example, someone might allocate budget for the car to be $20,000, and then say that he wants to get the most sporting car that he can for this price. He can then assign his personal subjective preferences for external attractiveness and engine quality to come up with a ranking. Or, he need not convert qualitative information to numbers. He could just look at cars  within his budget and classify them as A,B,C – extremely attractive, attractive, and average looking – in appearance. Then, depending on his personality, he might check the engine characteristics of the A-rated cars to ensure that they are satisfactory for his purposes, and buy the most attractive one. Or he might go for a compromise between Appearance and Engine. None of these methods of choosing cars correspond to creating a ranking by numbers of the cars.

This brings us to the META-QUESTION: Why are we discussing numerical measures of car quality? This is because there has been a huge emphasis on measuring things and assigning numbers to qualitative concepts. The idea of “measuring” intelligence by a single number – the IQ – was invented in the 20th Century. But this is NOT a good idea. Complex multidimensional characteristics like “intelligence” cannot be reduced to a single number. In order relate knowledge to our life-experiences, we have to break knowledge out of the boxes to which it has been confined in the West. It is exploring these meta questions that leads us to the understanding of world we are living in, which has shaped our ways of thinking. It is this understanding that offers us liberation for the boxes to which education confines our thought.