Quartiles As Natural Data Summaries

[bit.ly/dsia06b] – Part B of Lec 6 on Descriptive Statistics: An Islamic Approach.  A Fisherian approach to statistics begins by ASSUMING that the data is a random sample from a theoretical population characterized by a small number of parameters. Such an assumption has no basis in reality, but is made to make statistical computations possible in a pre-computer era. Because of inertia, this century-old methodology continues to dominate the field, even though advances in computational capabilities have made it obsolete. Instead of an assumed distribution, REAL statistics takes the ACTUAL data distribution as the central tool for data analysis. This actual distribution never belongs to any of neat theoretical family of distributions which make for elegant mathematical analysis. It cannot be written does on paper as a formula, but it can easily be computed by the computer. A visual depiction of the actual data distribution is the Histogram, which we have studied earlier. A more rigorous mathematical approach would be based on the Empirical Cumulative Distribution Function (ECDF) which we will study and explore later. The ECDF is directly based on the data, depends on all of the data, and does not allow for data reduction, unlike the Fisherian approach. It is just that we can now handle computations on 15509 data points in the HIES data with a click, so we do not NEED to reduce the data before we start the analysis.

It is nonetheless useful to have a few SUMMARY statistics which describe the data distribution. Our main concern in this lecture is to develop the concept of the QUARTILES, as a natural and intuitive description of the data set. Parenthetically, we note that in the Fisherian approach, these summary statistics are the mean and the standard deviation, which work wonderfully for the hypothetical normal distributions, but are extremely poor for other distributions. Here is a visual description of how we define the Quartiles.

Quartiles

First we sort the data, so that it is arranged in increasing order. Then we divide the data into FOUR equal parts. With 15509 points of data, this comes out to about 3877 data points in each of the four portions. The Summary Statistics are the separating points for these four portions of the data: Q0, Q1, Q2, Q3, Q4. Note that HH Size is always an integer and varies from a minimum of 1 to a maximum of 61 on this data set of 15509 points. The summary statistics are computed as follows:

  • Q0=1=Minimum HHS
  • Q1=5=HHS(3877)= 1st Quartile, 3877.25=15509/4
  • Q2=6=Median – HHS(7754), 7754.5=15509/2
  • Q3=8=HHS(11631)=3rd Quartile, 11631.73=3*15509/4
  • Q5=61=Maximum

We humans are much better at absorbing information in pictures and graphs, as opposed to numbers. That is why a Box-and-Whiskers plot (short form: Boxplot) provides a Graphic View of Summary Stats:

BoxplotHHS

The LEFT whisker of the box-and-whisker plot goes from Q0 to Q1, the minimum to the first quartile. For HH Size the minimum value is 1, while the first quartile occurs at HH Size = 5. The third quartiles is at HH Size = 8. The BOX is made between Q1 and Q3 and represents HALF of the data. 25% of the data in in the lower whisker, while another 25% is in the RIGHT whisker which goes from Q3 to the maximum HH Size of 61. A line is drawn in the middle of the box to show where Q2 or the median belongs. Thus, all 5 quartiles Q0,Q1,Q2.Q3, and Q4 are pictures in the boxplot.

So what do we learn from this boxplot? Of the greatest importance is the Central Value or the median, which is HH Size = 6. What exactly does this mean? It means that HALF of the households have HH Size ≤ 6, while HALF of the households have household sizes ≥ 6. Thus the HH Size of 6 divides the population into two equal halves, where one half is smaller and the other half is larger. Some technical issues arise because HH Size is integer valued and jumps from 5 to 6 to 7 without taking any values in the middle. Thus, when we look at HH Sizes of 1,2,3,4,5, less than 50% HHs these sizes. When we add the size 6, then more than 50% of the households have size 1-6. This is a technical issue which is not of importance for us in the present context.

After the CENTRAL VALUE or the median, the next most important thing is the SPREAD of the data, which is measured by the Interquartile Range. This is defined as the distance between Q3 and Q1. In this data set, the BOX goes from HH Size 5 to HH Size 8. This means the 50% of the households have sizes in the range 5,6,7,8. 25% or less have HH  Size below (1,2,3,4), while 25% have HH Size above (9,10,…,61).  This tells us that the distribution is Asymmetric; it is Right SKEWed. The left whisker is very short, so the data distribution has a Short Left Tail. On the other hand, it has an Extremely Long Right Tail. To understand the quartiles better, we show how we can compute them from the following table. For each HH Size, the table COUNTS the number of HouseHolds with SMALLER HH Size. Thus, the first entry shows that there are 3412 HH’s which have size {1,2,3,4} (less than 5). We note that 3412/15,509 = 22%, so this is less than a quarter of the population. However, when we go to the next entry, that is 5521 HH’s of size {1,2,3,4,5} and this is 35% of the population. So HH Size = 5 goes from 22% to 35.6% which COVERS the 25% or the first quartile. Similarly, HH Size = 6 takes us from 35.6% to 50.6%, which COVERS 50% or the second quartile. Similarly, HH Size =8 takes us from 64.7% to 75.6%, which COVERS Q3 = 75%.

HH Size < %
5 3412 22.00%
6 5521 35.60%
7 7891 50.90%
8 10030 64.70%
9 11722 75.60%

These problems, where the percentiles jump from 22% to 35.6% without coming close to 25% arise because HH Size is an integer and can only take certain fixed valued. We next look at the Summary Stats for HH TE/cap (Total Expenditure per capita), which is a continuous variable. As we will see, these problems do not arise for continuous variables. A table similar to the one above lists the five quartiles of TE/cap in the first column. The second column list the NUMBER of HH’s  which have smaller TE/cap, while the 3rd column displays this number as a percentage of 15509.

TE/cap #HH below % HH below
MIN = 1966 0 0%
Q1 = 14275 3877 24.998%
Q2 = 19454 7754 49.997%
Q3 = 28648 11631 74.995%
MAX = 1268708 15508 100%

A visual depiction of these quartiles can be seen in a boxplot:

BoxplotTEcap

The central value is the  Median TE/cap = 19,454. This is central because 7754 HHs are below (having less TE/cap) and also 7754 are above, having more TE/cap. In traditional statistics, one might use the Average value of PKR 27,119 for the Center of this distribution. This is great if the distribution is normal, but it becomes Very Distorted due to presence of huge outliers, which are not part of any normal distribution. In general, the widely used summary statistics of the Mean is best for Normal, but VERY BAD for general distributions. In contrast, the Median works well for ALL distributions, and has a natural and intuitive interpretation.

The next thing we learn from the data is the Dispersion: How Spread Out is the Data? The boxplot used the middle 50% of the data to measure this.  The Interquartile Range. [14275, 28648] – Half of the households have TE/cap within this range. 25% have LESS and 25% have more.  IQR = 28468-14275 = 14193. This is a natural measure of dispersion for general distributions. It is the REPLACEMENT for Standard Deviation which works well ONLY for normal distributions.

The boxplot also tells us about the Skewness & the Tails. Both HHSize and HH TE/cap are right-skewed. TE/cap is much more skewed. Both have large right tails — HHSize goes upto 61 – HH TE/Cap  goes to 1,268,708. TE/cap has much more extreme extension in right tail. In contrast, the Normal distribution is symmetric and has thin tails.

We would like to study the relationship between HH Size and HH TE/cap (which is a proxy for HH Wealth). In conventional statistics, the methodology of doing this is based on “regressions”. As usual, these regressions are based on large numbers of unverifiable and false assumptions. Famous statistician David Freedman said that ‘we have been running regressions for a century. This has not led to any useful results. Let us abandon the technique’. In real statistics, we propose to use the Median Line of X given Y as a REPLACEMENT for regression lines. We will illustrate this by drawing the two Median-lines, one of HH Size against TE/cap and the other for TE/cap against the HH Size. Intuitively, the idea is to create small boxes (bins) for one the variables, say Y. That amount to making the range of variation small for that variable. Within a bin, the variable Y does not vary much. Now compute the MEDIAN value of X in this bin. That will tell us the central value of X for Y’s within a particular box or bin. Now, as we change the Y-values, moving up across the Y-bins, we will find 203 949 5500  how median of X changes as Y changes across bins. This will give us an idea about how the variable X responds to changes in the variable Y. We now illustrate the concept of Median-Lines for our HIES data set.

Conceptually, it is easier to see how the Median TE/cap varies with household size. We simply subdivide the data into groups according to HH Size. For each HH Size, we look at ALL the HHs with that size.  Here is the Median Graph of TE/cap according to HH Size:

HH ExpPerCap vs HH Size

The first point on the graph shows that when HH Size = 1, the MEDIAN TE/cap is above 100,000. Note carefully what this means. It does not mean that ALL Households of size 1 are rich. Rather, there are (only) 159 HH’s of size 1 in the entire sample of 15,509. Among these 159 HH’s the median income is above 100,000 – that is more than half, or 80+ HH’s, have income in excess of 100,000. 80 of the HH’s in this group (having size = 1) have TE/cap LESS than 100,000.  Similarly, for each category of HH Size, the dot shows the median TE/cap of all HH’s having that size.

It is clear from the graph that, as HH Size Increases, MEDIAN TE/cap decreases. The most rapid changes occur early, for small HH Size. From HH Size of 1 to 5, there is rapid reduction of Median TE/cap as HH Size increases. From HH Size 5 to 10, there are small reductions in median income. After HH Size = 10, Median line seems pretty flat.

Next, we consider the other Median-LIne of HH Size for TE/cap groups. In order to create this, the first step is to subdivide TE/cap into small buckets. There are many possibilities, but in the present case, a natural method is as follows. We note that 775 x 20 = 15500, so if we create 20 buckets, with each bucket having 775 families, we will cover 15500 families. To cover the remaining 9 families, we can just add one family to every other bucket. We will describe the full technical details how to do these operations in EXCEL in the next portion of this lecture. For the moment, we just note the  income groups which are created by this procedure are as follows:

Group

No.

TE/cap

Lo

TE/cap

Hi

Group Size Group

No

TE/cap

Lo

TE/cap

Hi

Group Size
1 1966 9609 775 11 19455 20819 775
2 9614 11220 776 12 20821 22308 776
3 11221 12324 775 13 22309 24018 775
4 12324 13359 776 14 24024 26151 776
5 13359 14275 775 15 26154 28648 775
6 14275 15235 776 16 28651 31909 776
7 15236 16199 775 17 31916 37098 775
8 16200 17235 776 18 37100 45565 776
9 17236 18272 775 19 45576 64297 775
10 18273 19454 776 20 64317 1268708 775

Each of these 20 buckets have 775 or 776 families. Now we look at each of these buckets separately, and compute the MEDIAN HH Size for each group of 775/776 families. These can be plotted as follows

Med HH Size VS TEcap

This is a graph of the Median HH Size for each of the 20 income groups as described above. This same information can be given in tabular form as follows:

TE/cap Med HHs TE/cap Med HHs
1966 9 19455 6
9614 8 20821 6
11221 8 22309 6
12324 8 24024 6
13359 8 26154 5
14275 7 28651 5
15236 7 31916 5
16200 7 37100 5
17236 7 45576 5
18273 6 64317 4

 

Both the graph and the table provide us with the same information. As we go up the TE/cap groups, the median HH Size decreases. This supports the idea that wealthier families have fewer children. But it also supports the reverse causality. That is, having more members in a Households reduces the amount of money available per member. That is, large HH Size leads to poverty. To understand causality is of essential importance, but this cannot be learnt from the data – the data does NOT provide the information required to learn about the causal directions.

Conclusions

We have done this data analysis without any assumptions about randomness. Even though we are using the word “distribution” to describe the data, this is just an observed pattern that the data follow. We are NOT making any assumption that the data is a random draw from any distribution at all. Fisherian old-school statisticians will find this terminology very confusing, because we are using similar words with different meanings. For example, the Median-Lines a a description of the “conditional distributions” of HH TE/cap given HH Size and also of HH Size given HH TE/cap. More discussion of this subtle issue will be given later in the course.

Both Median Lines show that wealthier families have less children – conversely, small HHs corresponds to higher TE/cap. Note that variables have been CAREFULLY chosen – This result holds for TE/cap but not for TE. Real Statistics requires relating data series to real concepts, not just treating them as numbers. Our Median-Lines show ASSOCIATION between the two variables. CAUSALITY cannot be learned directly from the data. The last point of great importance is that the relationship between HH Size and HH TE/cap is not deterministic.  At any given HH Size, we have a large range of Households with very different TE/caps. Similarly, in every income (TE/cap) group, there is a large range of HH Sizes. How to understand these “flexible” relationships, also called “stochastic” relationships, will be the subject of the next portion of this lecture.

Links to Related Materials:

Do the Rich Have Fewer Children?

[bit.ly/dsia06a] Part A of Lec 6 on Descriptive Statistics: An Islamic Approach (DSIA) deal with the “Demographic Transition”: changes in birth rates experienced during the process of economic development. This lecture is devoted to the study of differences in Household (HH) Sizes by wealth, a topic which goes back to the birth of the subject of statistics. We can think of the central questions for study as being: Do rich people have fewer children? Why? What does it mean? How can we find out? What are the Policy Implications of Such Findings?

Sir Ronald Fisher, the founder of statistics, was faced with a fundamental problem: How to make superior races grow faster? This would lead to improvement in the genetic pool of humanity, also called the ASCENT of man. Just as man is superior to apes, so the evolved “Super-men” would be superior to the current human beings. A major problem for the Eugenicists was the fact that: the POOR Breed Faster (as emphasized by Malthus & followers). Two solutions were proposed to solve this problem:

  • Negative Eugenics: sterilize or spread disease among poor by keeping them in crowded conditions
  • Positive Eugenics: Encourage rich to have more children. To carry this out on a PERSONAL basis, Fisher had SIX children, before his divorce.

Fundamental thinking of the Eugenicists was that Wealth or Class is marker of GENETIC superiority. The GOAL of this Lecture is to study this issue, of difference in HH Size according to wealth, statistically, To this, we make use of the 2006 HIES (Household Income and Expenditure) data set. This is a random sample of 15,509 Households from all over Pakistan. The Idea of RANDOM SAMPLING is due to Fisher, and is one of his major contributions to the discipline of statistics. If created correctly, Random samples should be REPRESENTATIVE of the population as a whole. Unfortunately, this requires meeting a very difficult condition. EVERY person should have an EQUAL chance of being included in the sample. Genuine Random Sampling is almost impossible to achieve in large populations like Pakistan. One must start by listing all members of the population, to ensure that everyone gets an exactly equal chance. This, and other issues, lead to the switch to an easier goal. Various kinds of devices are used to get PROBABILITY samples – where you can calculate the probability of inclusion for everyone, even though it is not EQUAL. These are advanced topics, which we will study in later courses.

We note that the Computational Capabilities at our disposal to analyze the HIES data set were not available to Fisher. The simplest possible Fisherian Analysis starts by ASSUMING data is Bivariate Normal. This allows reduction of the TWO Variables – HH Size and HH Expenditure (15,510 x 2 = 31K Data Points) to FIVE numbers: sums of the series (2), sums of the squares of the series (2), and a sum of the product of the two series. It was part of the mathematical genius of Fisher to see that these 5 numbers would provide a complete summary of 31,000 data points IF the data was a random sample from a bivariate normal distribution. If the Fisherian assumption did not hold, there was simply no computational capability to analyze the entire data set! In fact, in Fisherian times, even computing these FIVE numbers from the HIES DATA Set would require several man-days of labor! The kinds of computations we will recommend were not even THINKABLE in those pre-computer days.

Even though it makes data analysis possible without a computer, there are serious problems with the Fisherian Assumption. On this particular data set, with HH S being an INTEGER from 1 to 60 – Normality CANNOT hold. Similarly HH Exp is highly SKEWED, clearly not a normal distribution. For such data sets, there was an attempt to find “data transformations” which would CREATE normality. For example, LOG (HH Exp) may be closer to normality, on this data set. However, ALL types of Fisherian analyses REPLACE the REAL finite population with an  IMAGINARY theoretical infinite population. The ONLY VIRTUE of this replacement is that it makes data analysis possible without computers. Unfortunately, the ASSUMPTION is BLATANTLY false – There is an ACTUAL population, with an ACTUAL distribution, which fails to match ANY of the theoretical distributions which are the subject of study in heavy statistical textbooks.

We are now light-years ahead of Fisher in terms of computational abilities. Whereas it would take multiple man-days to compute the sufficient statistics for this HIES data set of 15,509 x 2 data points, we can compute these and MUCH MORE with just a few keystrokes and clicks. Instead of ASSUMING a theoretical distribution for a hypothetical imaginary parent population, we can compute the EXACT distribution of the ACTUAL sample that we have. We will do so shortly. This distribution will have NO NAME – it does not belong to any theoretical family of parametric distributions. We cannot do fancy math with it. BUT WE CAN LOOK AT IT !!!

To begin with, let us illustrate a CRUDE Fisherian Analysis of the HIES Data Set. Assume the data to be bivariate normal – this is a theoretical distribution which has the simplest possible analysis. Calculate Means and Variances for both S,E, plus Covariance (S,E). These numbers are given in the table below, computed in seconds in EXCEL:

Avg HH Size=             6.9         158,454              =Avg HH Total Exp

Std Dev HH S=           3.39       132682               =Std Dev HH Total Exp

Correl= 0.252

A traditional statistics course would make a major effort to teach students the meaning of the assumption of normality, as well as that of the average, the standard deviation, and the correlation. These concepts are essential for analysis of normal distributions, but do not work very well when the underlying distributions are not normal. Unlike conventional courses, we will not deal with normal distributions in this course, and with the associated concepts of mean, standard deviation, correlation. What is of interest from this classical Fisherian analysis is that there is a positive correlation between HH Size and HH TE (Total Expenditure). This means that as the HH TE increases, the HH Size also increases. But this contrary to the basic idea we are pursuing, according to which wealthy people have smaller families. The data seems to be telling us that wealth – as measured by HH Total Expenditure – goes together with HH Size. More of one means more of the other. This is puzzling – we will resolve this puzzle later.

We just presented a brief sketch of how a Fisherian analysis would go, without explaining the details, for the sake of illustration. Now we come to the heart of the lecture, the alternative methodology of Real Statistics. This is based on a direct analysis of data distribution. The latest versions of EXCEL allow us to produce a histogram – a picture of the data distribution – with a click. Here is the HH Size Histogram, as produced by EXCEL

HHSizeDist1t60

HH Size Ranges from 1 to 60! We have a few very large households in Pakistan. However, there are very few households above 20; so few, that the bar is not even visible in the histogram. Nonetheless, showing the 20-60 portion of HH Size COMPRESSES the area available for the 1-20 range to only a 1/3 of the graph. This is harmful because this is where most of the data is. We can clarify the picture by Splitting the Graph into two different ranges 0-20 and 20-60. This is one of the main goals of Descriptive Statistics: to learn to LOOK at data. This simple lesson about how to make graphs which focus on the main parts of the data is NOT a part of traditional statistics, since traditional statistics is not concerned with the display of the data. However, this is an important lesson for descriptive statistics. Here is a graph of the first portion, with 0-20 HH Size, omitting the higher HH Sizes:

HHSize1t20

We can also convey this information in the form of a table which tells us, for each HH size, how many households in the sample of 15,509 have that size. Here is the table:

HH Size # HH HH Size # HH
1 158 11 520
2 649 12 356
3 1032 13 235
4 1573 14 166
5 2109 15 103
6 2370 16 96
7 2139 17 64
8 1692 18 48
9 1217 19 34
10 829 20 30

This table gives us information about the HH Size for the HIES sample of 15,509 HH’s. We see that the largest number of HH’s – 2370 – have size 6. Sizes 5 and 7 are the next most popular with roughly 2100 HH’s at each of the two sizes. Once we get to size 20 and above, only a very few HH’s in the sample have such a large size. This is a direct look at the data, without any of the “SUMMARY STATISTICS” which are an essential part of conventional statistics.

Just like the HH Size, we can also LOOK at the HH TE (Total Expenditure per Household). Note that this is a Luxury which was NOT available to Fisher. It would have required enormous number of man-hours to make the following graph of HH TE, which shows a Typical Income Distribution

HH TE Dist

This graph is HIGHLY skewed to the right, showing right away that it is not normal (Normal distribution is symmetric around center). Skewness means that there are a FEW people who are very, very, rich. The proportion is so small you cannot even see the bars on the graph. But since EXCEL draws the graph to go from the top to the bottom of the range, we can FEEL the presence of these super-rich people in the form of the long X-axis. As before, a better graph would result from SPLITTING into two parts; one part for the masses of people, and a separate one for the super-rich. Since this is not our main concern for the moment, we turn our attention to the relationship between the HH S and the HH TE.

This is main topic of interest – how does HH Size vary with wealth. We can simply ask EXCEL to produce an X-Y plot of these two variables. Each HH is plotted in the X-Y plane as a dot (X,Y) where X is the HH Size and Y is the HH TE. The graph looks as follows:

HH TE vs Size

This is NOT a good way to look at the data. Thousands of points are plotted within a small space so we have no real idea of what the data looks like. The graph is too crowded. The reason we made it is because EXCEL can do it with a click. Also, to show the student the need of thinking about how to plot the data in GOOD ways. HOWEVER, even though it is a bad graph, it does NOT show what we were looking for. We thought that as wealth (represented by HH TE) increase, HH Size would go down – richer families have fewer children. But the graph shows a positive association. More HH TE goes together with bigger HH Size.

Another way of making the graph makes the picture even clearer, and also gives us some practice in making better graphs. Categorize the HH’s by size. In each category, compute the AVERAGE total expenditure for ALL of the families in that category – that is, all HH with same HH Size. This can be done easily with EXCEL commands, and leads to the following result:

AVG HH TE vs HH Size

As the HH Size increases from 1 to 20, the average HH TE in that size HH ALSO increases, as the graph shows. This is in CONFLICT with the idea that rich families have FEWER children. Higher Average Expenditure corresponds to more children. However, intuition, experience, and general observations around the globe, tell us that wealthy families generally have fewer children. How can we resolve this MYSTERY?

The SOLUTION to this mystery comes when we realize that HH TE (Total Expenditure) is NOT the right PROXY for HH Wealth. Rather, we should use: HH TE per CAPITA – how much household spends on EACH person in the household. To see this, note that if HH Size 1 has TE=10,000 and HH Size 2 has TE=15,000, HH Size 2 is POORER, even though it has larger TE. This is because it spends 7,500 per person on two people, while HH 1 spends 10,000 on 1 person. We can get a better Proxy for wealth by dividing the Total Expenditure by the HH Size.  Dividing HH TE by HH Size gives us HH TE per capita. We can take the new variable and recreate the same chart as in the previous picture – the AVERAGE Wealth (proxied by TE per capita) for each HH Size. This is plotted below:

HH ExpPerCap vs HH Size

FINALLY, this chart SHOWS what we want to see; as HH Size increases, HH TE/Capita declines. Larger Households have less money available for members on a per capita basis. Note the this decline is very steep in the early portion of the chart, while the curve becomes pretty flat after HH Size of 10 or so. The biggest effects are seen at the earliest parts of the curve, at low HH Size. Perhaps we could say that the wealthy households have 1,2, or 3 children but no more. The less wealthy can have any number of children. The graph just gives us the picture of the data, without telling us anything about the causal connections. We do not know if increased wealth leads to fewer children, or whether having more children depletes wealth, or whether there is some third factor we have ignored, which creates this apparent relationship between wealth and family size.

CONCLUSIONS

So what do we learn by this preliminary REAL data analysis of the 2006 HIES data set? We learn that direct examination of the data – made possible by EXCEL — gives us a lot of information, WITHOUT any statistical assumptions. In contrast, Fisherian Statistics TYPICALLY involves IMPOSING assumption of Multivariate Normality on Data. When this assumptions is made, then Means, Standard Deviations, Correlations, COMPLETELY summarize data. This was one of the brilliant mathematical contributions of Fisher – he proved that these five numbers are SUFFICIENT STATISTICS – once you have them, you can throw away the data without loss of information.

Of course, this sufficiency was based on the ASSUMPTION that the theoretical distribution assumed is VALID. Fisher was ASKED about how you choose a statistical model; the theoretical distribution for the data. He  DODGED the problem by referring to the practitioners, saying that they would know, from their practical field experience, the right theoretical model. For a century, the entire field of statistics has been built on the idea that the first thing to do is to ASSUME a theoretical statistical model for the data. This is because such an assumption allows the reduction of the data to a few sufficient statistics, and enables analysis. With our current computational capabilities, we NO LONGER need to make arbitrary assumptions to reduce the data. We are now capable of dealing with the FULL DATA set of 15,509 points, without trying to reduce it. However, statistician have invested a century of effort in developing tools for data reduction, and they are not likely to give up using them anytime soon. Neils Bohr, inventor of Quantum Mechanics noted, in frustration at the reluctance of physicists in accepting his theories, the Physics progresses one funeral at a time.

The REAL statistics we propose is to do is nowhere as complex as Quantum Mechanics. Quite the opposite, we propose to eliminate complex mathematical assumptions and analysis. Real Statistics starts just by LOOKING at the data in intelligent ways, using graphs and tables, whatever type of method is suitable to reveal aspects of reality we are exploring. This can only be learnt in an apprentice like fashion, by studying case after case of intelligent visualization of the data. This is our hope for this course.

Related Materials: Three Previous Lectures which lead up to this one:

5C: Fisher’s Failures and the Foundations of Statistics – Lecture shows how Fisher’s personal fights for prestige have shaped the discipline of statistics, in the wrong directions – bit.ly/dsia05c

5D: Real Statistics:Explains how advanced computational statistics allows us to dispense with the need to make arbitrary assumptions about the data. Shortlink: bit.ly/dsia05d

5E: Fisher VS Reality: Comparison of Statistical Methods. Illustrates the two different approaches to the study of QTM (quantity theory of money) on Australian Data – Shortlink: bit.ly/dsia05e

Essential Aspects of Islamic Revival

A previous post on this topic (How to Launch an Islamic Revival?) provides an outline of an upcoming online course (Registration link: Islamic Revival (URDU) [bit.ly/gfu2e] or Islamic Revival (English) [bit.ly/ytlie]. This post elaborates further on some important aspects of the topic.

For an Islamic Revival, we as Muslims must learn to value the Quran, Hadeeth, and our own intellectual traditions, above the wisdom of the West (which they have acquired over the past few centuries). We must learn to appreciate that(10:58) Tell them (O Prophet!): ‘Let them rejoice in Allah’s grace and mercy through which this (Book) has come to you. It is better than all that they accumulate. But it is not enough to have faith in the Quran, we must demonstrate the TRUTH of the claim that the knowledge provided by the Quran is better than all that the West has accumulated over the past few centuries.

The superiority of Islam can be seen in the amazing transformation created by Islam 1450 years ago, when it took ignorant and backwards Bedouin from the bottom to the top of the world civilizations. These teachings launched a civilization which enlightened the world for a thousand years. Do these teachings have the same power today? Can they transform our lives and change the world? A superficial look at the world around us would tell us: “NO! If these teachings had this power, we would not find the Muslims – the inheritors of this knowledge – in the desperate conditions that we see. Today, Muslims are again ignorant and backwards, living in conditions not too far removed from the Jahilliyya of pre-Islamic times.

The lives and conditions of the Muslims appear to testify to the failure of Islamic teachings. We must challenge this appearance and show that it is wrong. The point to begin with is the prophecy that Islam came as a stranger, and will become a stranger. The Western diagnosis of the ills of Islamic societies is that it is Islam which is holding us back. We can progress only if we abandon Islam and become modern and Westernized like them. This diagnosis must be countered by the Islamic alternative: We are ignorant and backwards today because we have abandoned the teachings of Islam. BUT this is just an empty claim unless we can DEMONSTRATE it. This demonstration must be made on three different fronts.

PERSONAL: Islam teaches us how to live, and how to strive for Ihsan – excellence – in all dimensions of our lives. This is an area where the West is a complete blank. None of the sciences of the West deal with the meaning of our lives, and development of our spiritual capabilities. How can we progress from Nafs-e-Ammara to Lawwama to Mutma’innah? How can we find inner peace, contentment, and satisfaction in our lives. How does the remembrance of Allah (alone) bring happiness to our hearts? It should be clear that learning how to live takes priority over learning how to split atoms. Islam provides us ways to find deep meaning in our lives, and to develop the potential for excellence which all human beings are born with. If we grow this seed of potential into the firmly rooted tree of faith, with branches extending to the skies, then our lives will be equivalent, in the eyes of Allah, to the entire humanity. These extremely precious teachings are part of our intellectual heritage. Unfortunately, this heritage is largely lost. It is hard to find models of “excellence in conduct” and even the idea of striving to emulate the excellence of our Prophet SAW is no longer part of our teaching models. This is essential for the revival of Islam. Some earlier posts which challenge students to re-think their lives, and overcome the conditioning created by Western education, are linked here: Learn Who You Are! and Ways of the Eagles

COLLECTIVE: Individuals are building blocks of communities. When individuals collectively strive for excellence in conduct, they create neighborhood communities, centered around mosques. The extreme emphasis on rights of neighbors, and on the love between the hearts of Muslims, is designed to create vibrant communities, which take care of all members. These communities, and other institutions based on caring for each other and social responsibility, are at the heart of an Islamic society. Today, we have lost these institutions, which were centered on the WAQF designed to provide financing for them. Reviving these must be at the heart of an effort for revival of Islam. This is the sense in which Islam is a stranger. Our governance structures, legal, economic, political, social, educational, and even religious institutions, are no longer in their original forms, and have been replaced by Western analogs. But institutions are formed from a collection of individuals and so efforts at institutional reform must be done together with efforts at building character. For further discussion of this topic, see Unite and Prosper and The Road to Madina

SCIENTIFIC: Paradoxically, this front is the least important from the Islamic perspective, but most important from a practical perspective. The Ummah as a whole has been mesmerized by the glory and the power of the West. The West worships science and technology, and attributes its current dominant position to them. As a result, most intellectual of the Ummah believe that the only path to progress lies through acquisition of this knowledge, in addition to our traditional religious knowledge. This leads to the widespread misconception that acquiring Western intellectual skills is of the highest priority, and even the sole remedy for the current problems of the Ummah. Because the eyes of the Ummah have been dazzled by the brilliance of Western knowledge, they are unable to appreciate the treasures of Islamic heritage. To undo this bedazzlement and enchantment, it is necessary to point out the deep cracks in the foundations of Western wisdom. This is particularly easy to do for the social science. That is because the social sciences are all based on deeply flawed model of human beings and human communities. By denying spirituality and after-life, Western social sciences create a very shallow model of human behavior as being driven solely by selfish desires, without any higher visions or purpose in life. It is not enough to critique the social sciences. The need of the hour is to show how Islamic teachings allow us to create a superior alternative, built on radically different foundations furnished by Islam.

Islamic Economics: A sketch of how such an effort can be initiated in Economics is available in my paper on Islam’s Gift: An Economy of Spiritual Development . This shows how modern economics is based on the concept of “homo economicus” which corresponds to Nafs-e-Ammara in Quranic terms. Economics is the religion of worship of the Nafs, and designed to move men to become Asfala Safeleen. In contrast, an Islamic Economy strives to develop spirituality in people, and is built to create spiritual progress towards the ideal of Nafse Mutmainnah.

Statistics: An Islamic Approach. This online course explains how modern statistics is designed to deceive. The book “How to Lie with Statistics” has sold more copies than all other textbooks of statistics put together. The maxim “Lies, Damned Lies, and Statistics” shows widespread awareness of the deeply deceptive nature of the discipline. This course explains how the foundations of statistics were laid by Eugenicists who created the discipline to prove the superiority of the white race, and to create arguments for extermination of inferior races. The tools developed are designed to deceive. We show how the entire discipline can be re-built on different foundations, aligned with Islamic ideals. For a discussion of the fundamental differences between conventional statistics as created by Sir Ronald Fisher, and “Real” Statistics, see: Fisher VS Reality: Comparison of Statistical Methods.

In addition to providing sound foundations for Economics and Statistics, the above two approaches are designed to show that Western Social Sciences are fundamentally flawed. As explained in my paper on the Origins of Western Social Sciences, these sciences are built on rejection of religion. The goal is to break the spell and enchantment of Western knowledge. Once Muslims understand that Western Social Science are deeply defective, then they will be able to see the deep wisdom of Islamic teachings. The West can teach us how to build bombs, but Islam can teach us how to build harmonious communities which provide loving and nurturing environments for our future generations. But this remains an empty claim, useless unless we can create living models as demonstrations. In order create such communities, built on love of Allah, and service to the creation of Allah, we need to change the educational system which teaches us to admire and respect Western intellectual traditions. These traditions place no emphasis on community and on excellence in personal conduct, which are at the heart of Islamic teachings. The above efforts in Economics and Statistics provide a initial sketch of the effort needed throughout the social sciences.

Fisher VS Reality: Comparison of Statistical Methods

[bit.ly/dsia05E] – Part E of Lec 5:Descriptive Statistics: An Islamic Approach. This lecture explains the difference between classical Fisherian approach and our REAL statistics approach, within context of a study of the Quantity Theory of Money.

In previous portions of this lecture, we have emphasized the need for a new approach, which we call “Real Statistics”.  In this lecture, we illustrate the differences between the conventional approach and our new approach using the already studied example of Australian Inflation. In this connection, it is of great importance to understand the following:

The DATA is ALL we have – The STATISTICAL ASSUMPTIONS imposed on the data DO NOT PROVIDE US with additional information. HOWEVER, all statistical inferences we make RELIES HEAVILY on these UNVERIFIABLE (and typically false) ASSUMPTIONS.

First Step of a REAL analysis: LOOK at the DATA with reference to a REAL world issue under examination. In this case, we are interested in the Quantity Theory of Money in general. In particular, we want to examine Milton Friedman’s idea that “Inflation is always and everywhere a monetary phenomenon, in the sense that it is and can be produced only by a more rapid increase in the quantity of money than in output” The data can tell us whether or not this important hypothesis about the economy, which asserts the neutrality of money, is true.

Graph of Prices (GDP Deflator) and Broad Money. Money has been rescaled to be 100 in 2019, just like the price index series. This data is taken from the WDI data set.

AustraliaPaMLevel

The two graphs clearly show different trends. In the early period from1972 to 1990, prices are increasing sharply, while money is increasing slowly. Later, Money starts to increase sharply while price curve is flatter, showing a smaller rate of growth. Looking at the graph leads to the IMMEDIATE conclusion: There is no strong direct relationship between money and prices. Note that this conclusion is based on direct examination of data, without any stochastic assumptions required for the Fisherian approach.

Second Step: Look at DATA in WAYS which are relevant to ISSUE of concern! In this case, the QTM tells us the increases in money stock lead to increases in prices. To examine this, we need to look at the Rate of CHANGE in prices, and also the Rate of CHANGE in money. In previous analysis, we came to the conclusion that the best measure of rate of change is the following:

  • Define %P = log{P(t)/P(t-1)}
  • Define %M = log{M(t)/M(t-1)}

With this definition, the growth rate of money over two years will be sum of the separate growth rate for each of the two years. A Graph of %P and %M is given below:

AustraliaDPvDM

 

This graph shows a ROUGH correspondence between the two series, but also shows many anomalies. That is, there appear to be sharp increases and decreases in %P which do not correspond to any similar change in %M. It does not seem that %M can explain all the fluctuations in %P, contrary to Friedman’s dictum. However, this is just a preliminary impression, and more careful analysis is needed to come to firm conclusions.

Third Step: Find ways to analyze the data SUITED to the question you are asking. The jagged graph above contains too much information, and is not directly suited to telling us about: “How STRONG is the ASSOCIATION (not causation) between %M and %P?” Note that association is symmetric, and causation is uni-directional. Even though we are interested in the causality – does %M cause %P or is it the reverse? – the data cannot tell us about this crucial question. Techniques for studying causality are extremely important, but are not part of conventional, Fisherian statistics. In fact, many famous statisticians are on the record as having denied the relevance, importance, or even meaningfulness of the idea of causality. We will not look at this debate in any depth in this current course, which deals with elementary concepts only. However, causality must be an important part of any “REAL” statistics.

As a first step towards the deeper and more complex concept of causation, we can try to measure contemporaneous association. “Contemporaneous” means that we look at the relation between %P(t) and %M(t) for the same year t – we do not look at associations across time. This is always a useful first step. The STANDARD METHOD in use for this purpose is as follows. ASSUME data is jointly normal. Apply the formula for the correlation coefficient; this is the best measure of association for Bivariate Normal Distribution. As with all methods of conventional Fisherian statistics, this method suffers from a serious PROBLEM: it works VERY POORLY if data is not Normal. There is NO REASON to assume data under examination is like a random sample from a hypothetical infinite bivariate normal population. Instead, we develop a direct and intuitive method for evaluating association below.

As a first step, divide %P and %M into HIGH and LOW. We want to know if %P is High when %M is High, and also whether %P is low when %M is Low. The question is: How to divide a series into HIGH and LOW parts? There is a NATURAL and INTUITIVE methodology for doing so. We can SORT the series in ascending order in EXCEL. Find the MIDPOINT. We have an annual series with 59 point of data, so the 30th data point would be in the middle. Series data points BELOW midpoint can be classified as LOW, while data points above midpoint are HIGH. This is natural in the sense that the 29 lowest rates of %P within the 59 points are classified as LOW, and the 30 highest data points within the data set are classified as high.

Using this method of classifying %P and %P into HIGH and LOW values, we can make a chart of Lows & Highs for %M and %P as follows:

HiLoDpDmpng

 

Many interesting patterns can be seen from the above graph, which shows the highs and lows for both %P (changes in prices) and %M (changes in broad money). First we note that from 1961 to 1972 rates of money growth (%M) were LOW,  except in the two years 1963 and 1964.  Corresponding to this period, we find that %P was low in 1961-66, High in 1967, Low in 1968, and then consistently High from 1969 to 1990. There were two decades of high inflation in 70’s and 80’s was followed by a low inflation period from  1991-2004. This picture immediately leads to many questions:

Why was there an episode of high inflation in 1967? If Friedman’s hypothesis is true, than it must have been due to previous high rate of increase in money.  Could it be that High %M in 1963, 1964 led to High %P in 1967? Knowing about the mechanisms of money and prices, this seems highly unlikely. Note that this conclusion comes from our general understanding of how the real world works, NOT from the data itself.

A second important question is “Why did %P become Hi over 69-90?”. In connection with Friedman’s hypothesis, it is interesting to note that %M became high much later than these periods of high inflation. Money growth rates became consistently high in the period 79-91. Why did increase in money growth FOLLOW the increases in inflation? According to one theory, monetary policy should accommodate the needs of business. In periods of high inflation, the need for money is high, and so one should print more money. We need to look at the minutes of the Monetary Policy Committee to see what they were doing and why they were doing it. It seems clear that the periods of HIGH inflation from 1969 to 1979 were not preceded by High rates of growth of money (%M). It seems likely that this inflation came from a different source. Again, we need to more carefully at the real world, and try to find other causes of inflation, to explain the patterns we see in the data.

From this preliminary analysis, it is clear that Real Statistics leads us to ask different KINDS of questions. In Fisherian statistics, we start by ASSUMING that the Data are RANDOM draws from hypothetical population. In this case, the data is ONLY a means to discovering the parameters of this IMAGINARY population. All of the sophisticated mathematical machinery of inference and hypothesis testing deals with issues of how we can use the data to learn about the imaginary population from which the data is ASSUMED to be a random sample. In this scenario, if we know the parameters of the imaginary population, we don’t NEED the data!! The parameters provide us with COMPLETE information about the data set!! The Individual Data points DO NOT MATTER and are MEANGINGLESS – they are all random draws and could come out differently next time. This is in dramatic contrast to real statistics: Each data point matters! We do not come to understand data by imagining a hypothetical underlying population. Instead, we understand Data by looking at the Reality which generates the data. Why did rates of money growth %M become high over 79-91? No amount of playing games with this data will lead us to the answers. Instead, we must examine the Australian Economy, and maybe the world economy as well. We must look at methods of money creation, to find sources for the extra money created. One source is the government. Look at bulletin of the Monetary Policy Committee. How were they making decisions about monetary policy. Were they accommodative or forward looking? What were the variables they considered? But additionally, we may study private money creation by financial institutions. This was a oeriod of Financial De-Regulation. Removal of restrictions led increase in loans and creation of money. This  may be reason why %M was high in this period.

On more technical note, we can also use this partitioning of data to create a simple measure of association, which does not rely on unverifiable assumptions about imaginary populations. We can simply DIVIDE the data into two halves – those with HI %M and Lo %M. Now look at the behavior of %P on each of these halves separately. If nearly all of the HIGH value of %P occur within the High %M data set, then the two series must be highly correlated. Doing this simple counting of the data gives us the following table of counts:

%P vs %M %M=LO %M=HI
Lo %P 20 9
Hi %P 9 21
Total 29 30

We see that in 29 years where the money growth was in the bottom half (%M is Lo), %P is also Low for 20 Years and High in 9 Years. Similarly, for the 30 years where %M is High, %P is high in 21 years and low in 9. This shows a strong association between the highs and lows of %M and %P. When one is high, the other one is also high in roughly 2/3 of the cases. One high and the other low occurs about 1/3 of the time. If there was no relationship between highs and lows of %M and %P, we would expect to see about 50% of the total or 15 cases out of 60, in each of the four boxes in the diagram above. The numbers show a strong but loose relationship. It seems clear that Friedman’s hypothesis, the %M is the ONLY source of inflation, is not correct. While %M does exert a strong influence on %P, it seems likely that there are other causes of inflation as well – about 1/3 of the cases of Hi and Lo %P cannot be explained by Hi and Lo %M in the same period.

Conclusions

There is strong relationship between %M and %P. Direction of causation is not clear, and CANNOT be learnt from the data. INSTEAD, we can learn about causes by “expending shoe leather.” Expending shoe leather is a metaphor for exporing the real world and searching for causes. In this particular case, we must analyze bank statements and Monetary Policy statements, and look at real world factors for money creation and inflation.

We learn that the direct analysis of data, WITHOUT ANY assumptions about stochastic structure, gives us a lot of information about the real world. It is important to understand that  NO MORE INFORMATION is available. Stochastic assumptions REDUCE importance of data, by mis-directing our attention from the data itself to a hypothetical imaginary infinite population, from which the data is assumed to be a random sample.

The lessons of this lecture are summarized in the picture below:

RealVSFisher

Related Materials: Previous Portions of Lecture 5:

  1. Malthus and the Birth of Statistics
  2. Galton: Eugenicist Founder of Statistics
  3. Fisher’s Failures & the Foundations of Statistics
  4. Real Statistics

An organized version of all the lectures, together with supplementary materials and quizzes, is available as an online course. Currently, it is being offered free, in expectation of receipt of useful feedback for finalizing the course. You can work through the lectures at your own pace, and also ask questions on any of the materials. To register for the course, use the following link: Al-Nafi Portal: Descriptive Statistics (Registration)

 

Real Statistics

[bit.ly/dsia05D] – Part D of Lec 5:Descriptive Statistics-An Islamic Approach. In previous lectures, we have explored some of the reasons why foundations of modern statistics constructed by Sir Ronald Fisher are deeply flawed. In this lecture we explain the basics of our alternative approach to the subject.

This lecture will explain how we can re-build Statistics on new foundations. To do this, we will first explain the foundations of conventional statistics – which may be called “nominalist” or Fisherian statistics. Then we will explain the alternative approach we propose, naming it REAL statistics. Our goal in this lecture is to provide clarity on the differences between the two approaches.

The Fisherian approach is based on fancy mathematical models, which are purely IMAGINARY – That is, the models come from the imagination of the statistician, and have no corresponding object in reality against which they can be verified. A Fisherian MODEL for data ALWAYS involves treating SOMETHING as a perfectly random sample from a hypothetical infinite population. However, there is flexibility in what that “something” may be – it is this flexibility that is deadly, allowing us to prove anything we like. The flexibility was not originally part of Fisher’s approach. He proposed to model the data directly. Later workers “generalized” his approach to make it applicable to a wide variety of data sets and situations.  This generalization was dangerous because it makes unverifiable assumptions about unobservable entities and uses this as the basic engine of inference. In contrast, the Fisherian approach makes assumptions directly about the data, and hence is easier to assess and understand, although equally difficult to prove or disprove.

The typical use of this imaginary methodology involves breaking the data into two components: DATA = LAW + ERROR. The LAW captures a flexible class of models which you believe to be true. This flexibility makes the ERROR unobservable, because it shifts as you try out different potential laws. This gives you a HUGE potential for constructing ANY LAW you like to explain the data – what is unexplained by the law is AUTOMATICALLY part of the ERROR.

We illustrate how this methodology allows us to prove anything at all: Take any data, and decompose at as DATA = Desired Law + Error. This is always valid By DEFINING Error := DATA – Desired Law. Now make STOCHASTIC assumptions about Error in rough conformity with errors obtained at your desired law. Current methodology allows us to make almost any assumptions we like about the error. The beauty of the stochastic assumptions is that a wide range of numbers satisfy them. If we say that the errors follow some common distribution ( that is, they are random draws from a hypothetical infinite population) it is very hard to assess whether or not this is true. This difficulty is increased because a flexible range of laws make it difficult to pinpoint the actual errors, to check the stochastic assumptions. Furthermore conventional methodology generally does not bother to even try to test assumptions on errors, making it even easier to prove any model conforms to the data.

The key illusion created by conventional statistical methods is based on a misunderstanding of the nature of statistical models. ALL statistical inference is based on the IMAGINARY stochastic model regarding errors. HOWEVER, textbooks create the widespread belief that inference comes from the DATA! This is what permits us to “LIE with statistics”.  Making complex assumptions about errors allows us to achieve any kind of inference, and attribute this to the data. Then we can browbeat people by telling them that we have made a deep analysis of the data, and the truths we have uncovered cannot be accessed by ordinary people not trained in the mysteries of sufficient statistics. In fact, the inferences come from unverifiable assumptions about unobservable errors.

In opposition to this, we propose an alternative, which we will call REAL Statistics. At the heart of this approach is the idea that the data provides us with CLUES about underlying realities. The goal of inference is NOT related to DATA itself. Rather, the GOAL is to use the data to UNDERSTAND the real-world processes which generated the data. This NECESSARILY involves going beyond the data. Conventional statistics treats only the data, and Fisher explained that the goal of statistics is to reduce large and complex data sets to a few numbers which adequately summarize the data and can be understood. Today, because of advanced computational capabilities, we are able to directly handle large data sets, and can move beyond this idea of statistics as being the reduction of data sets.

Our approach radically changes the task of the teacher of statistics, requiring the creations of new textbooks as well as more training. We must ALWAYS look at the DATA set together with the REAL WORLD PROBLEM under study with the help of the given data set. We can NEVER study DATA sets in isolation, as a collection of numbers. This teachers will have to acquire knowledge and expertise going beyond the numbers to the real world phenomena which generate the numbers.

Another way to understand conventional statistics is to say that it has the following GOAL: find STOCHASTIC patterns in the data. These patterns allow us to treat the data as Random Sample from an IMAGINED population. There is NO WAY to assess validity of imaginary assumption. The pattern is in the eye of the beholder, and cannot be matched against real structures to see whether it is “true”.  The standard methods to assess validity of patterns are goodness of fit, prediction, and control. These are central to conventional methodology, but of peripheral interest in the real methodology. To understand why the search for patterns fail, we consider the failure of this methodology illustrated by the failure of the forecast competition run by International Journal of Forecasting (IJoF). The IJoF ran a competition for many years, where researchers were invited to submit algorithms for finding patterns in data, and using these algorithms to predict the next few data points. IJoF tried these different pattern finding algorithms on a thousands of real world data series to see which one works best. But these competitions did not yield any consistent results. Different types of algorithms would perform differently across series, with unpredictable patterns of performance. This becomes perfectly understandable from the REAL statistics perspective. An algorithm would perform well if an only if the pattern it discovered matched the underlying real world structures which generate the data. These structures differ widely across the data series and so no one algorithm could find them all. It is only after we know the real world context that we can search for the right kind of pattern. Without checking for match to reality, we are just ‘shooting in the dark’ and completely random forecasting results are to be expected. For more details, see A Realist Approach to Econometrics.

We come to the question of “How to do REAL statistics?”. The basic goal is to Look at the BEHAVIOR of the data to get CLUES about the operation of the real world. Note that this step – looking at the data – was NOT POSSIBLE when Fisher created his brilliant methodology (for the time). Given a 1000 points of data, it was a massively laborious task to graph the data, or to create histograms, which provide a picture of the data distribution. Now, we can do this with one click. The ultimate GOAL is to discover CAUSAL EFFECTS, or UNOBSERVABLE OBJECTS, which give rise to the patterns we see in the data. But the first step is to just be able to look at the patterns in the data, without imposing preconceived patterns on them, as required by the Fisherian approach. Descriptive Statistics is about LEARNING to look at the data in a way which leads to LEARNING about the real world. The real world is characterized by unobservable objects and unobservable causes. But before we can learn about these deeper realities, we must learn how to read the surface – the appearance of the data. An early approach to “just looking at the data” was pioneered by Tukey, with the name of Exploratory Data Analysis. EDA was a collection of techniques for looking at the data. However, it was consistent with, and complementary to, the Fisherian approach. The goal was to see if the data patterns would validate a Fisherian model for the data, or whether they would suggest some alternative theoretical models. EDA looks at the data in order to generate a Fisherian hypothesis about the data – NOT a hypothesis about the real world process which generates the data.

The TASK of a DS teacher is much more difficult than that of a conventional statistician. Biometrics is the study of statistics applied to Biological Problems. The teacher must know some biology in addition to statistics. The real world context has dramatic effect on HOW to analyze the numbers. We illustrated this by the study of inflation, where the discussion required understanding WHY inflation matters, and WHY we are trying to measure this. Different numbers and different techniques become useful according to different uses for these inflation numbers.

Since there is no universal collection of methods valid for all contexts, teaching can be by apprenticeship, via case studies only. Within any real world context, we must learn about the real world to understand the linkages between the real world and the numbers which measure aspects of the real world. We must know the MEANING of the numbers, not just the numbers. This necessarily requires going beyond conventional statistics, which deals only with analysis of numbers. No template for analysis can be given to students. Rather, by teaching how to think about numbers in different real world contexts, we hope the student will learn some ways of thinking which can be applied more generally. This is like the “case study” method now popular in business schools.  In this course, we will illustrate this methodology in different contexts.

Concluding Remarks

In this course, we are trying to learn HOW to LOOK at DATA. This is because this is first and introductory course. Learning and analyzing deeper real world objects and causes is very much a part of REAL statistics, but requires advanced methods, suitable for later courses. We note that techniques of “Data Visualization” enabled by computers were far beyond the reach of researchers a few decades ago. Making a histogram, or a graph, of 1000 data points was extremely laborious task. Now it can be done with a click. It is NO LONGER necessary to make convenient simplifying assumptions – as in Fisherian approach to statistics. This leads to a radical conclusion: A HUGE amounts of extremely sophisticated mathematical theory is PURELY IMAGINARY and can be thrown out of the window! , We can temper this radical conclusion by noting that there are certain limited contexts where the Fisherian probability models provide an adequate match, or even an excellent match, to the actual data. In such cases, the original methods would continue to be valid and useful, as supplements to the more general approaches to be studied in Real Statistics.

Links to Previous Lectures.

Previous Portions of Lecture 5: 5A; Malthus and the Birth of Statistics, 5B: Galton: Eugenicist Founder of Statistics, 5C: Fisher’s Failures & the Foundations of Statistics.

Motivation and Explanation of the Islamic Approach, is given in the first lecture. 1A: Descriptive Statistics: Islamic Approach, 1B: Purpose: Heart of An Islamic Approach, 1C: Eastern & Western Knowledge, and 1D: How to Teach & Learn: Islamic Principles

Currently, this course is under development, and is being offered for beta-testing as a free online course, with the expectation of getting useful feedback for the final version. You can register for the course at the Al-Nafi Portal: Descriptive Statistics: An Islamic Approach.

Fisher’s Failures & the Foundations of Statistics

[bit.ly/dsia05c] – Part C of Lec 5:Descriptive Statistics-An Islamic Approach (DSIA). This lecture is about the personality of Sr Ronald Fisher and his foundational ideas about statistics.

Fisher was a prominent Eugenicist, and he had six children in accordance with his belief that the path to improvement of the human race involved increasing the propagation of superior specimens of humanity. A central question for us is:  “Is modern statistics FREE of its Eugenicist origins?”. The minority position is NO. This position is described and well defended by Donald Mackenzie in his book: “Statistics in Britain,1865 to 1930:The Social Construction of Scientific Knowledge”. He writes that “Connections between eugenics and statistics can be seen both at the organisation level and at the deiailed level of the mathematics of regression and association discussed in chapters 3 and 7. Without eugenics, statistical theory would not have developed in the way it did in Britain – and indeed might not have developed at all, at least till much later.” In brief, Eugenics shaped the tools and techniques developed in statistics. However, the Dominant View is that Moderns Statistics is FREE of its racist origins. This view is ably defended by Louçã, Francisco in his article on “Emancipation Through Interaction–How Eugenics and Statistics Converged and Diverged.” Journal of the History of Biology 42.4 (2009): 649-684. He argues in favor of the Consensus View: There is no doubt that origins of statistics are due to Eugenics project, but it has now broken free of these dark origins.

In this part of the lecture, we look at the personality of Fisher, and assess how it shaped the foundations of statistics. It is acknowledged by all that Fisher cantankerous, proud & obstinate. He would never admit to mistake, and was stubborn in defending his position, even against facts. He was also vengeful: To oppose Fisher was to turn him into a permanent enemy. In many battles, Fisher took the wrong side. HOWEVER, he won most of his battles because of his brilliance, to the detriment of truth. The impact of Fisher’s victories has permanently scarred statistics, and continue to guide the field in the wrong directions. This lecture is about SOME (not all) of his fundamental mistakes.

Perhaps the most basic, and also the most confusing, was the battle between Fisher and  Pearson regarding the testing of Statistical Hypothesis. This is confusing because today both of the two conflicting positions are taught to students of statistics simultaneously. Even though the conflict was never resolved, it is now ignored and glossed over, buried under the carpet. The Fundamental Question is “WHAT is a hypothesis about the data?”. According to Fisher, a hypothesis treats data as a random sample from a hypothetical infinite population which can be described by a FEW parameters. WHERE does this ASSUMPTION come from? It comes from the NEED to reduce a large amount of data to a FEW numbers which can be studied. This reduction is needed because of our LIMITED mental capabilities – we cannot handle/understand large data sets. Fisher wrote that: “In order to arrive at a distinct formulation of statistical problems, it is necessary to define the task which the statistician sets himself: briefly, and in its most concrete form, the object of statistical methods is the reduction of data. A quantity of data, which usually by its mere bulk is incapable of entering the mind, is to be replaced by relatively few quantities which shall adequately represent the whole, or which, in other words, shall contain as much as possible, ideally the whole, of the relevant information contained in the original data.” The parameteric mathematical model for treating the data as a random sample from a hypothetical infinite population allows us to reduce that data, making inference possible. The hypothetical infinite population does not have any counterpart in reality.

What is to prevent the statistician from making completely ridiculous assumptions, since the model comes purely from the imagination, and purely for mathematical convenience? For this purpose, Fisher proposed the use of p-values. If the data is extremely unlikely under the null hypothesis, this casts doubt on the validity of the proposed model for the data. The p-value tests for GROSS CONFLICT between data and the assumed model. One can never learn whether or not the model is true, because there is nothing real which maps into the assumed hypothetical infinite population which follows the theoretical distribution being assumed. To Fisher, the mathematical model is a device to enable the reduction of the data, and not an true description of reality.

In a classical example of mistaking the map for the territory, the Neyman-Pearson theory of hypothesis testing takes the Fisherian model as the TRUTH.  The Null hypothesis is ONE of the parametric configurations. The Alternative hypothesis is SOME OTHER parametric configuration. The Neyman-Pearson theory now allows us to calculate the exact most powerful test – under the assumptions that the parametric models COVER the truth. The possibility of a TYPE III errors – that is, none of the assumed parametric models is valid – is ruled out by assumption, and never taken into consideration. BUT the assumption of a parametric model to describe the data is arbitrary. The imaginary infinite population following a theoretical distribution has been made up just for mathematical convenience!

In the course of the bitter personal conflict which ensued, the real issues, related to the common weakness of both approaches were ignored and suppressed. Instead, Fisher’s promotion of his methods led to dramatic misuse &  abuse of the Fisherian p-values. The P-values MEANT to assess gross conflict and serve as a rough check on the modelling process. Instead, these were turned into a REQUIREMENT for valid statistical results. The hugely popular philosophy of science developed by Karl Popper was very useful in elevating the importance of the p-value: we can never PROVE a scientific hypothesis, but we can disprove them. A significant p-value disproves a null hypothesis creating a scientific fact. Insignificant p-values mean nothing. This led a fundamentally flawed statistical methodology currently being taught and used all over the world. The problem is that there are huge numbers of hypothesis which are NOT in gross conflict with the data. By careful choice of parametric models, we can ensure that our desired null hypothesis does not conflict with the data. The Neyman-Pearson theory can ADD to this illusion of the validity of imaginary hypothesis, if we find alternatives which are even more implausible than our favored null hypothesis.

Fisher Versus Gosset. The p-value invented by Gosset measures statistical significance, which is very different from practical significance. Gosset warned against confusing the two from the beginning. Unfortunately, because it was a tool in Fisher’s war against Neyman-Pearson, Fisher pushed it to the hilt. This led to a fundamental misunderstanding of the role and importance of p-values in statistical research which persists to this day.  The damage inflicted by these misguided statistical procedures has been documented by Stephen T. Ziliak and Deirdre N. McCloskey in The Cult of Statistical Significance:How the Standard Error Costs Us Jobs, Justice, and Lives

Perhaps of even greater fundamental importance was the battle between Fisher and Sewall Wright. Sewall Wright invented path analysis – a method for assessing CAUSAL effects. If this method had been understood and adopted, modern statistics would be entirely different. Unfortunately, Sewall Wright had a fight with Fisher on some obscure genetics controversy related to EUGENICS. As a result, Fisher’s ignored, neglected, and criticized, all contributions, and attempts at developing a theory of causality. To be fair, this was not entirely Fisher’s fault. Theories of knowledge in vogue, based on logical positivism, suggested that unobservables cannot be part of scientific theories. This led to difficulties in understanding causality, because it is never directly observable, and is always based on understanding of unobservable real-world mechanisms. Over the past few decades, there have been revolutionary advances in understanding of causality, made by Judea Pearl and his students, which build on causal path analysis similar to the methods of Sewall Wright. Unfortunately, statisticians and econometricians have mostly failed to learn from these methods, because they go against decades of indoctrination against such methods.

Failure to understand causality continues to be a serious problem for statistics. One of the most dramatic illustrations was the controversy about Cigarettes and Cancer in the middle of th 20th Century. For more details about this controversy, see Pearl & Mackenzie: The Book of Why (Chapter 5) and also Walter Bodmer:  RA Fisher, statistician and geneticist extraordinary: a personal view. A friendly relationship turned into enmity when Bradford Hill and Richard Doll published an extensive empirical study documenting the effect of smoking on cancer. This conflicted with Fisher’s views that correlations cannot prove causation, and also ideology of libertarianism. These convictions led Fisher to deny empirical evidence regarding the link between smoking and cancer long after it had become overwhelming. Because of his enormous prestige, his opinions delayed recognition of the link, and the necessary policy response. Fisher’s obstinate refusal to accept strong statistical evidence in conflict with his ideologies delayed the policy response, and probably led to substantial loss of lives due to lung cancer.

What lessons can be learned from this personal history of the founder of modern statistics? Islam teaches us a lot about the search for knowledge. See Principles of Islamic Education for a detailed discussion. Here we briefly discuss some of the  required attitudes for Seekers of Truth. We must learn to valuing knowledge as the most precious treasure of God,  seeking it with passion, energy, and utmost effort. This was one of the keys to how Islamic teachings made world leaders out of ignorant and backwards Bedouin.  We must also understand that knowledge, or insight, is a GIFT of God. We must learn to take small steps, and be grateful for small advances in understanding. Knowledge is like a castle constructed brick-by-brick from small elements. We must acquire patience for the long haul, instead of expecting quick results. The knowledge we acquire does not come from our personal capabilities; it is a gift of God. We cannot take pride in discoveries because they are not due to my genius. The pride of Qaroon is condemned – that my wealth is due to my own wisdom and capabilities, and therefore I do not recognize the rights of others. We must learn humility & gratitude: I have been given knowledge beyond what I deserve, and beyond my capabilities. Furthermore, because of our limited capabilities, we can often make mistakes, and fail to recognize the truth, and confuse it with falsehood. Thus, we must ask Allah to show us the difference:

اللهم ارنا الحق حقاً وارزقنا اتباعه وارنا الباطل باطلا وارزقنا اجتنابه

DuaHaq

Human knowledge is a social construct. That is, we create knowledge collective, and validity is created by consensus. This makes it tremendously important to have the right rules for discussion and argumentation. Islam has a well developed collection of rules which govern the Etiquette of Discourse. Unfortunately, these are no longer studied and taught, much less practiced. Among the rules is the prohibition of Debate, with a view to prove ME right and YOU wrong, and WIN arguments. This just feeds the ego and creates pride. Instead, arguments are SUPPOSED to be a cooperative search for truth. To explain this point, a teacher told his student never to debate. The student was surprised and said – “But we have seen you debate?”. The teacher explained that in the earlier generations, people entering into an argument would make dua “O Allah, make the Haq clear to us, and let the truth and clarity come from the OTHER person.” In my generation, people would ask for the Haq to be made clear, but also ask that the truth should come from MYSELF. In your generation, people have stopped caring about the truth, and only seek to win arguments!

Conclusions

To summarize this discussion, we conclude that modern conventional statistics is based on fundamentally flawed methods. Fisher created a method for reducing data based on an imaginary infinite population. Because this is imaginary, there is no possibility of assessing the truth of such a hypothesis. However, if it is strongly in conflict with apparent characteristics of the data, the p-value can be used to reject such a null hypothesis. A vast class of null hypothesis will fail to be in manifest conflict with the data, and failure to reject is the closest we can get to truth in Fisherian methodology. This was useful for the cause of Eugenics, since it can allow us to prove MANIFESTLY false null hypotheses. The methodology for stastistics started out in the wrong place, partly due to the Eugenic roots, but also due to computational limitations and to the personality of Fisher. However, the persistence of these flaws across time was due to failures in the ethics of dialog – the rules for social construction of knowledge. The followers of Fisher failed to consider the origins of the methods, and imitated his methods without understanding them. In particular, they failed to understand the radical revisions of methodology made possible and necessary by the amazing advances in computing capabilities. This is why it has become necessary to re-build statistics on new foundations, abandoning the Fisherian methodology. We will discuss some of reforms required in later portions of this lecture.

Links to Related Materials:

Previous portions of this Lecture 5 of DSIA discuss the racist origins of statistics: Part A Malthus and the Birth of Statistics Part B:Galton: Eugenicist Founder of Statistics

The basic ideas behind the course: WHY an “Islamic Approach” to Statistics?

How to Launch an Islamic Revival?

[bit/ly/azlir] This post provides an outline of an upcoming online course. To register for the free course, offered in Urdu  and separately in English, register yourself on the appropriate Google Sheet: Registration for Islamic Revival (URDU) [bit.ly/gfu2e] or Islamic Revival (English) [bit.ly/ytlie]

 

 

 

For Muslims, the current conditions of the Ummah are heartbreaking. The question of how we can transform these conditions is of burning importance. This sequence of eight lectures is designed to address the core elements required for positive change. We can organize our thoughts by asking: “How did Islam lead ignorant & backwards Bedouin to World Leadership?” The revolution launched by the teachings of Islam 1450 years ago is a unique event in human history.

  • CAN this revolution be replicated today?
  • DO Islamic teaching have the power to AGAIN take ignorant and backwards Muslims TODAY from the bottom of world civilizations to the top?

Our Hearts and Faith says YES. The message of the Quran was meant for eternal guidance of mankind. However, Our Minds and the Empirical Evidence says NO! if the Islamic teachings had this revolutionary power, Muslims would not be in the conditions that we see them today. They are backwards in all dimensions of existence, both material and spiritual.

How can we resolve this PARADOX and CONFLICT? How can we reconcile the HEAD & the HEART and get them to agree? This online course will attempt to answer these fundamental questions in 8 Lecture over Two Months. The Eight Lectures are listed below:

L1: Syed Abul Hassan Ali Nadwi -What the World LOST due to the DECLINE of Islamic Civilization

We have forgotten our history and our achievements. What were the great gifts that Islam gave to the world? How did the decline of Islam adversely affect the world? Because we have learned to view the world through Western glasses, we fail to appreciate the vital contributions of Islam to the World Civilizations. Maulana Nadwi’s book is an essential antidote to this blindness to our own history. We must learn to appreciate the contributions of Islam, before we can learn how to recreate them today.  For a detailed discussion, see: : What the World Lost Due to the Decline of the Islamic Civilization. Shortlink: bit.do/asrdm

L2: Developing an Islamic Worldview

A Western education indoctrinates us into a Eurocentric worldview, according to which all great accomplishments of humanity are due ONLY to Europeans, and occurred only in the past four centuries of the rise of Europe. This is dramatically in conflict with Islamic teachings which tell us that the most important development in the history of humanity was the advent of Islam 14 centuries ago. How can we learn to see the world with Islamic eyes, instead of European ones? To free our minds from enslavement to these Eurocentric theories, we must understand that Wealth and Power are NOT the measures of progress. Rather, it is the development of human beings. When viewed through this lens, Islam shines brightly, while what Europeans call progress has led to decline of humanity. For details, see Developing an Islamic WorldView: An Essential Component of An Islamic Education. Shortlink: bit.ly/iwv4az.

L3: Origins of Western Social Sciences

Our worldview is created by the Western Social Sciences, which claim to be objective, impartial, and rational. However, these sciences are based on the denial of religion, God, afterlife, and other central Islamic (and Christian) concepts. They redefine the purpose of human life to be the pursuit of pleasure and power, and reduce human beings to animals. They give materialistic and secular answers to fundamental questions like the purpose of creation of the universe and mankind. We need to learn to see through these deceptions. Building an Islamic Social Science requires  recognizing and rejecting of these atheistic foundations of modern Western Social Science. For more details, see Origins of Western Social Sciences, Shortlink: [bit.do/azowss]

L4: Islamic Knowledge: Still Revolutionary after 1440 Years!

14 Centuries ago, Islamic teachings led ignorant and backwards Bedouin to world leadership. What was the secret of these teachings, and do they still have the same power today? Why do Muslims no longer trust these teachings, and instead hope to revive Islamic societies by using Western teachings? The Ummah as a whole believes that Islamic revival requires our acquiring the knowledge of the West, and not on deeper knowledge of the teachings of Islam. This is an illusion. The path to revolution lies in learning how to become HUMAN BEINGS Ahsan-e-Taqwim, the BEST of the creations of God. This is what the Prophet SAW taught, which launched the Islamic Revolution. Western education teaches external sciences, but offers nothing in the human dimension. Instead, the West only teaches us how to be Asfal-a-Safeleen, the WORST of the creations. For more details, see Islamic Knowledge: Still Revolutionary after 1440 Years – bit.do/aznww

L5: Rebuilding Islamic Societies

How to launch an Islamic Revolution?

  1. Revolutionary Strategies ask us to tear down existing secular, modern, capitalistic institutions and rebuild from scratch on new Islamic foundations.
  2. Evolutionary Strategies work on making gradual changes to existing systems to transform them towards the Islamic ideals.

Both have strengths and weakness, and success requires combining elements from both strategies while abandoning the weaknesses of both. For more details, see Rebuilding Islamic Societies. bit.do/azttp.

L6: Changing Our Development Paradigms

The West was built on the wealth and power created by global colonization. This has also become the meaning of development in Muslim minds. If we strive for wealth and power, we can only arrive at these illusions. We much change our goals of development, in line with Islamic teachings, before we can achieve the great successes of our ancestors. To create an Islamic Revolution, we must focus on the development of human beings. In particular, the transformation from Asfal-a-Safeleen (lowest of the low) to Ahsan-ul-Taqweem (best of the creations) must be the goal of development. For more details, see  Changing Our Development Paradigms:  bit.do/azwid –

L7: Three Mega-Events Which Have Shaped Our Minds

The MAIN obstacle to an Islamic Revival comes from the COLONIZATION of minds which took place when the West conquered 85% of the Globe, including most Islamic countries. The ways of thinking of West have been shaped by major historical events, and we have absorbed these same lessons via our Western education. TO liberate our minds, we must learn the history which shaped Western minds and how this has shaped our thoughts. Three major historical experience of the West have shaped Western thought. Understanding this historical experience, and its impact on minds and hearts, allows us the clarity required to liberate ourselves from the chains of colonial thought. For more discussion, see Three Mega-Events Which Shape Our Minds Shortlink: bit.do/azgt4

L8: The Ghazali Project for the Revival of Deen

The Mu’tazila were deeply impressed with Greek Philosophy and wanted to give it status EQUAL to the WAHY. Today, Muslims have given Western knowledge status even greater than the Quran and our intellectual traditions. To UNDO this damage, we must take the three steps taken by Imam Ghazali to counter the Greek influence:

  1. Deliverance from Error: Acquire certainty and clarity about the teaching of our faith. Remove confusion and doubt about Islam
  2. Tahafatul Falasafa: Show the deep contradictions and confusions within the bodies of Western knowledge, especially their human sciences.
  3. Ihya Uloom ud Deen: Show how Islamic teachings provide us with deep guidance not just on the Akhira but also directly relevant to our modern problems.

For more discussion, see The Ghazali Project for the Revival of Deen Shortlink: bit.do/azgt6

RELATED Materials: A later post on Essential Aspects of Islamic Revival

Galton: Eugenicist Founder of Statistics

Bit.ly/dsia05b – Part B of Lec 5:Descriptive Statistics-An Islamic Approach. This portion discusses the racist ideas of Galton, and some statistical tools he developed in his attempts to prove them.

The Islamic tradition asks us to look at both the nature of the knowledge, as well as the character and intentions of the transmitters of knowledge. In this lecture, we will look at Sir Francis Galton, the Founder of Eugenics. The following quote from his student and admirer Karl Pearson (1930, p. 220) explains Eugenics:

“The garden of humanity is very full of weeds, nurture will never transform them into flowers; the eugenist calls upon the rulers of mankind to see that there shall be space in the garden, freed of weeds, for individuals and races of finer growth to develop with the full bloom possible to their species.”

Looking through the metaphor of flowers (Europeans) and weeds (others), Eugenics call for the EXTERMINATION or STERILIZATION of inferior races, as well as inferior specimens among the Aryan (White) race. This reflects WIDELY HELD views among Europeans. NOTE the conflict with WISDOM of Quran: All Human Beings are Brothers and Sisters, Sons & Daughters of Adam and Eve. Furthermore, ALL human lives are infinitely precious – each life counts as heavily as the entire humankind.

The value of actions depends on intentions. An essential part of the Islamic approach to the acquisition of knowledge involves making the intention of serving the creation of God, out of the love of the Creator. Evil intentions lead to bad outcomes. In particular, some evil intentions are mentioned in the Hadeeth as follows: He who seeks knowledge to argue with Experts, to dispute with the ignorant, or to attract attention (seek popularity), all his deeds will be in vain.

We can document the intentions of Galton were exactly the ones which are forbidden for Muslims. He sought popularity and fame, to enter into disputes with the experts to impress the ignorant. For documentation, see Becoming a Darwinian: The Micro-Politics of Sir Francis Galton’s Scientific Career 1859–65. These intentions had a strong effect on the quality of knowledge produced – creating wrong paths of research and wrong ways of thinking, which continue to influence production of knowledge in the field of statistics in harmful directions. We discuss this in somewhat greater detail below.

Sir Francis Galton was the founder of Eugenics, while both Sir Karl Pearson, and Sir Ronald Fisher were prominent members of the deeply Racist movement of Eugenics. Their research was meant to create a Scientific basis for Racism. This had two aspects.

  1. Positive Eugenics: Use of breeding to create a superior stock of human beings.
  2. Negative Eugenics: Extermination or Specialization of Inferior Races. Specialization refers to assigning some subhuman and subservient role to a race – such as assigning blacks to be slaves of the Master Races.

The consequences of these ideas were horrific, leading to sterilization of thousands, culminating in the brutal killing of millions of Jews by Hitler. Because of these shameful consequences, even the name of the field has been erased from history. For one look at the evil consequences of the idea (which is current and dominant) that man is just another species of animal, see The Darwin Effect by Jerry Bergmann. The idea that mass killing of inferior peoples is actually necessary to reach an advanced state of civilization has been expressed by Karl Pearson, successor of Galton, and one of the eminent founders of statistics, as follows:

“History shows me one way, and one way only, in which a high state of civilization has been produced, namely, the struggle of race with race, and the survival of the physically and mentally fitter race. If you want to know whether the lower races of man can evolve a higher type, I fear the only course is to leave them to fight it out among themselves, and even then the struggle for existence between individual and individual, between tribe and tribe, may not be supported by that physical selection due to a particular climate on which probably so much of the Aryan’s success depended.” (Karl Pearson, 1901, pp. 19-20)

What were the statistical tools created to support the cause of Eugenics, or mass extermination of “inferior” people? Galton invented  “correlation”.  Why was Galton trying to measure relationship between heights of fathers and sons? He want to prove that heredity was all important in determining the heights of sons. But how does height relate to personality and intelligence? We must understand the strong hold of materialism on 19th Century minds. Everything was matter and was observable. Things which could not be observed did not exist. Weighing a body before and after death, scientists concluded that the soul did not exist, because the weight did not change. Thoughts were considered to be fluid secretions of the brain. Phrenology was in vogue – this pseudo-scientific field took measurements of the skull to determine personality. So if physical characteristics were hereditary, it would be sufficient to establish that personality, intelligence, character, etc. were also hereditary. This was the contribution of Galton – whereas Darwin’s followers had only considered physical characteristics, Galton argued that intelligence and personality were also hereditary, and subject to evolutionary pressures. This is also called “social Darwinism”, where it was argued that human societies could evolve to become better, by the ruthless survival-of-the-fittest mechanism, which eliminate the weakest members, to strengthen the race.

We look briefly at some technical details of Galton’s ideas about correlation and heredity. For any characteristic – like IQ, height, strength, etc. – we can subdivide the population into three groups: High, Medium, and Low. The question of correlation between sons and fathers was subject of intensive research by Galton and his followers. The following Diagram shows PERFECT correlation, the type of result that they were hoping to find:

HHMMLL

This diagram shows 100% Correlation: High IQ fathers always have High IQ sons. Similarly, the characteristics of the parents are transmitted to the children perfectly, for mid-level and low-level IQ as well. If the effect of heredity is strong, this justifies BREEDING human beings, like dogs and horses, to create a stock of SUPERMEN. Select High IQ people, ensure that they mate with each other, and sterilize or exterminate the rest of the population. This was one of the major GOALS of Eugenics.

To achieve clarity in understanding any concept, it is always useful to look at the opposite alternative: NO inheritance. This can be represented in the following diagram:

HMLHML

Here regardless of any parentage, the children are equally divided among the three categories. There is no effect of heredity. Even when there is 100% correlation, we cannot establish that this is due to heredity because smarter and wealthier parents are able to provide a better educational environment for their children. The Eugenics arguments becomes much weaker if the effects of heredity are weak. If children of any type of parents can achieve any level of intelligence via training and education, then breeding for intelligence becomes impossible. Eugenicists were stubbornly opposed to this idea, and resisted any interpretation of empirical evidence against heredity and in favor of environment.

The teachings of Islam, and the example of our Prophet Mohammad SAW show us that every human being is valuable, because everyone has a soul, and the capability to know God. With appropriate training, everyone can achieve high levels of spirituality. No one is born with these traits, and it requires struggle against the desires of the Nafs to advance from primitive spiritual stages to the higher ones. Anyone with some levelof spiritual development would have known that the idea of breeding human beings was patently ridiculous. The higher stages of human development can only be achieved by struggle, and all human beings are born with the capability of carrying out this struggle. The research program of the Eugenicists was possible only because of the low level of spiritual development of the founders of statistics. Human beings who make their desires their God reduce themselves to the lowest of the low, and make themselves similar to animals. Only then it becomes possible to consider breeding humans like animals and killing them like animals. This was the effect of widespread materialism in Europe.

The weaknesses of the tools in statistics arose because the Eugenicists were out to prove something which was not true. Only bad tools could accomplish this goal – if statistics had been constructed on the correct foundations, they would have been unable to prove their favorite theses. We provide some further description of these twisted tools developed to achieve racist goals.

We have shown the graphical version of perfect versus zero correlation in Inherited Characteristics. An alternative way of showing the same quantitatively is via Markov Transition Probabilities. Here the first table below shows the case of perfect correlation:

Father\/          Son=> High Medium Low
High 100% 0 0
Medium 0 100% 0
Low 0 0 100%

 

The rows show the characteristic of the fathers, while the columns are the sons. The transition probabilities show that 100% of High IQ fathers have High IQ sons, and similarly for Medium and Low IQ. There is perfect correlation between fathers and sons. The next table shows the case of ZERO correlation:

Father\/        Son=> High Medium Low
High 33.33% 33.33% 33.33%
Medium 33.33% 33.33% 33.33%
Low 33.33% 33.33% 33.33%

 

In this case, regardless of the type of father, all sons are equally divided into the three possibilities of High, Medium, and Low IQ. Empirical evidence regarding heights shows that there is roughly a 50% correlation, which corresponds to a 50-50 mix of the two polar case of perfect and zero correlation. This can be displayed in the following table:

Father\/          Son=> High Medium Low
High 66.67% 16.67% 16.67%
Medium 16.67% 66.67% 16.67%
Low 16.67% 16.67% 66.67%

 

Here the majority 2/3 of children fall in the same category as the father, while the rest are equally divided among the other two categories – this equal division may be replaced by other assumptions, and does not matter for what follows. What we would like to show is that, in all cases OTHER than perfect correlation, we will see the phenomenon of “Regression Towards the Mean”. Note that if High IQ parents do not have 100% High IQ children, then they will necessarily have children of LOWER intelligence. That is the ADVANCED intelligence will be REDUCED towards the average, or medium, intelligence. Similarly, in the LOW category, if all children are not LOW IQ than they will have HIGHER IQ and therefore move UP towards the mean. That is, there is a tendency of both extremes to move towards the normal. This “Regression towards Mean” causes problems for Eugenics. We cannot rely on High IQ parents to produce High IQ children based on heredity. ALSO, there is no reason to sterilize or exterminate Low IQ parents, because their children may move UP the IQ scale.  Nonetheless, deprived of the light of the message of the Quran about human equality and brotherhood, Eugenicists continued to mis-interpret empirical evidence, and advocate selective breeding of superiors and extermination of inferiors. The horrors of the Holocaust, where millions of innocent men, women, and children, were burnt alive, eventually discredited the cause of Eugenics.

 

Conclusions

Ideas are far more powerful than atom bombs – after all, it was ideas which led to the creation of atom bombs. The idea that man is an ANIMAL (Darwinism) has led to a lot of damage. By denying the existence of spirituality, It has made spiritual progress impossible. This idea has penetrated minds of Muslims through  Western education, which is purely materialistic. The potential for excellence in men is unlimited, and this can only be achieved through spiritual training – This CANNOT be inherited. Failure to recognize this potential reduces us to animals. This conception of man as an animal is at the heart of the Social Sciences developed in the West over the past few centuries. Because of this fundamental flaw at the roots, all productions of knowledge in this intellectual tradition are tainted. Statistics forms one part of the tools used to try and validate the racist ideas of the originators of the subject. Although it has moved far from these roots, and has many valuable accomplishments and ideas to its credit, the current shape of the subject continues to be affected by its origins, as we will show in later lectures.

RELATED: See previous post on Malthus and the Birth of Statistics

Malthus and the Birth of Statistics

[bit.ly/dsia05a] Part A of Lec 5: Descriptive Statistics: An Islamic Approach.

Preliminary Remarks:

  1. I no longer believe in the myth of Objective Knowledge, which I was trained to believe in, during the course of my Western education. When knowledge is objective, then it is the same for everyone, and does not have any relationship to personal experiences and subjectivity.
  2. Secular Modernity is a powerful World religion known to mankind. It has defined the goals of life, and ways of living as individuals, communities and nations, for all of us. Because it defines itself as OBJECTIVE, it has deceived us into accepting hidden normative frameworks.
  3. It claims objectivity, neutrality and rationality for itself, and a position of privilege in arbitrating disputes among others. Anyone who opposes it is automatically being subjective, emotional, and irrational. Thus it is important to SEE THROUGH this pretense of objectivity of secular modernity.
  4. My own position is based on an Islamic Perspective. ALL analysis must HAVE a perspective, and no neutral, objective perspectives exist. Conventional Statistics is really Statistics from a Secular, Modern Perspective (one religion). Here I am offering an alternative perspective

The methodology being used here is that of Michel Foucault, called the Archaeology of Knowledge.  We were taught the BINARY theory of truth. Statements are TRUE or FALSE – Objective Knowledge. INSTEAD – look at the EMERGENCE of ideas – IDEAS are the MOST POWERFUL tools in the arsenal of mankind. How these ideas were used to shape history, and how they evolved and changed as a consequence of historical forces.  This methodology leads to unique insights not available by any other method. In this sequence of lessons we dig into the foundations of the ideas which led to the creation of modern statistics.

Statistics is presented as an objective and ethically neutral body of tools and techniques. However, it was developed for a very clear, evil, purpose. There are polar Views on Transmission of Knowledge:

  • Look at the CHARACTER & Purpose of the transmitters of knowledge
  • Look at the CONTENTS – the body of knowledge transmitted

The fundamental underlying question is: “Can the THOUGHTS be separated from the THINKER?”. The balanced position is that BOTH are necessary. We need to Look at the BODY of KNOWLEDGE being transmitted. Also look at the CHARACTER and the PURPOSE of the transmitter/creator of knowledge. In contrast, the Western Intellectual Tradition says: “Look at the SUBJECT MATTER ONLY”. As opposed to this, a popular strand of the EASTERN tradition says: “look at the AUTHORITY of the TRANSMITTER only.” In the West, the MEANING of the word “Probability” changed to reflect this transition – Initially, the word referred to the authority of the transmitter. Later the word was used to mean the weight of the evidence for the matter (see Ian Hacking: The Emergence of Probability).

The Western intellectual tradition defines knowledge as being purely objective, and excludes subjective and personal experiences from the realm of knowledge. This leads to a clear NO answer to the fundamental questions: “Do INTENTIONS of producers of knowledge matter?”, and “Does the CHARACTER of producers of knowledge matter?”. This is in dramatic contrast with Islamic views, according to which ‘Value of Actions Depends on Intentions’. It is easy to see that the nature of scientific knowledge produced depends on the intentions and purpose for which the knowledge is being produced. This topic is now known as the sociology of knowledge, and many simple examples can be given as a proof:

The strong drive for profits led to multi-million dollars for high yield varieties of wheat, but genetically modified to have terminating seeds. This is to enable corporations to sell the same seed year after year. If the intentions had been to feed the hungry, different types of technologies and seeds would have developed.

  • “Orphan drugs” refers to drugs which provide cures to afflictions affecting masses of the poor, who are unable to pay enough to generate a profit on the production of the drug.
  • While there is little work on these, massive efforts are being made on designer-drugs, personalized and individually tailored to the genetic structures of billionaires.
  • The search for power has shaped the development of war technology, the developments of bombs, missiles and so much else. A humanistic bent would have led to developments of science in different directions.

With this as preliminary, we consider the personalities and intentions of the some of the major founding fathers of modern statistics: Sir Francis Galton, Sir Karl Pearson, Sir Ronald Fisher. Today it has been discredited and forgotten, but Eugenics was EMINENT and RESPECTABLE field of knowledge, taught at universities, and with prominent and influential supporters in the early 20th Century. All three of these founding fathers were big names in Eugenics, and statistics was developed by them as a tool to support their Eugenicist views. Eugenics asserts the Racial Superiority of Whites and the inferiority of the other races. Even more, it asserts that the elite classes are genetically superior to the commoners. According the Eugenics, the only Path to Progress lies in the Extermination of Inferior Races (negative Eugenics), and Increasing Growth of the Superior Race (positive Eugenics). Another way to deal with the inferior races was “specialization” – give them roles to fulfill which would fit their limited genetically determined capabilities (for example, making slaves out of the Blacks to do menial chores not demanding intelligence). A brief sketch of the background which led to the emergence of Eugenics as a prominent field of “science” is given below

A convenient point to start is the question of “Why is their poverty?” and “How can we reduce it?”. It will surprise the reader to learn that these are NEW questions.  Even though poverty is an age-old phenomena, the idea that this is a social problem which can be, and should be, remedied, is new. For details see An Islamic Approach to Inequality and Poverty. In pre-capitalist world, poverty was not seen as a social problem. Social responsibility, and the idea of a society as one body, where we must take care of each other, was sufficient to deal with problems of poverty. The emphasis on charity in Islamic teachings led to extraordinarily high levels of spending on the poor, especially via the WAQF, which created endowments to deal with different social problems. But similar patterns of charitable institutions for taking care of the poor are seen in all pre-modern societies.

The driver of major change in Europe was the Industrial Revolution which started in England in the 18th Century. The complex circumstances which created this change, and its effects on the transformation of economic, political, and social institutions, are described in The Great Transformation by Karl Polanyi (and many other books). Of relevance to our current discussion is the fact that industry requires a Labor Force – lives for rent and sale for money. In a capitalist society (like our society today), education is meant to CREATE mindsets suitable for labor force.  This means training students to make the goal of life the pursuit of pleasure, power, and wealth. For this purpose, students are trained to pursue CAREER over other concerns like family, society, spiritual growth, or excellence in any human dimension. For more discussion of this fundamental problem with modern education, see Learn Who You Are!. Social change resulted from the chosen solution to the fundamental problem created by the Industrial Revolution: “How to create a labor market?

The solution created by Europeans was deeply racist. We must believe in TWO classes of people: REAL human beings are capable of enjoying finer things of life. LOW level humans may LOOK like us, but they do not have rich inner lives – they cannot think and feel as deeply. They are closer to animals than to humans. Eugenics is based on the idea that the aristocratic elites have superior genes to the common masses. A little more historical detail is useful in understanding the emergence of these ideas.

A Tale of Two Cities by Charles Dickens opens with a scene where the carriage of a French aristocrat speeding through the crowded streets of Paris crushes a poor child. The aristocrat tosses a few coins to the bereaved mother, and continues on his way without further concern. The extreme inequality, and oppression of the poor by the elites led to the French Revolution, which changed the course of European history. As a consequence of the revolution, there was considerable debate about policies to help the poor, with a view to preventing a similar revolution in England. At this crucial juncture, the ideas of Malthus regarding the causes of, and solution to, the problem of poverty, had a dramatic impact on the policy debate. Humanitarian and compassionate solutions were replaced by cruel and harsh measures to punish the poor for their poverty. Malthus argued that poverty was due to a poor genetic endowment, and was inherited. The problem of poverty arose from the fact that the poor BREED faster than the rich. Thus, poverty is inherited, and being kind to poor, providing social services, is counterproductive. This will only increase the rate of growth of the population of the poor.  INSTEAD – we should sterilize the poor, keep them in crowded conditions, encourage spread of disease among them, to keep their numbers low.

This theme became linked with emerging theories of evolution and Mendelian genetics. How much of our makeup comes from inheritance, and how much is due to the environment and education we receive? This is often called the Nature Versus Nurture DEBATE. The dominant and widely accepted point of view leaned heavily in favor of Nature (heredity) having an overwhelming effect. This means that superior races will remain superior, and it is not possible to educate the inferior races to bring them up to the standards of the white people. The inferior people can either be exterminated or enslaved. The Malthusian approach to poverty was strongly based on the “nature” point of view. That is, poverty was due to bad genes which led to poor character and intelligence, and their was nothing we could do to change this, in the form of education or other interventions.

The wrong theories of Malthus led to a dramatically wrong approach to poverty. Malthusian theory suggested that providing support to the poor would only increase poverty, since that would allow them to breed faster. Thus social support for the poor was made deliberately humiliating and degrading, to discourage all but the extremely needy to resort to the poor houses. These theories and policies stand in stark contrast to Islamic teachings which urge us to provide support to the poor without humiliating them in the process. Furthermore, Islam teaches us that every human life – whether poor or rich, black or white, Arab or other – is equally precious. Indeed each human life counts as heavily as all of humankind. There were similar humanitarian streams of thought in the European Christian heritage, but Malthusian views came to dominate policy.

Even though nearly all of the predictions of Malthus turned out to be wrong, his theories had tremendous impacts on thinking about population. As we will see in the next portion of this lecture, the founding fathers of modern statistics invented the subject, tools, and techniques, in an attempt to prove the theories of Malthus. Our main goal is to show that tools developed have been influenced by the underlying agenda, and are not neutral and objective.  Malthus created his theories without a shred of empirical support, purely from his imagination. Since then, the theories of Malthus have been decisively proven wrong by the empirical evidence. The article, Malthus: the False Prophet, from the Economist, documents some of the major errors made by Malthus.

  1. Malthus argues that the population would increase geometrically. However, over the 20th Century, a Demographic Transition was observed, when increasing prosperity, and increasing likelihood of survival of children led to a reduction in birth rates, and stable population sizes.
  2. Malthus argued that food supplies would increase linearly, leading to shortages. However, continuing sequences of technological advances in agriculture have led to increasing food supplies per capita on a global basis.
  3. There were many other false predictions, based on the first two. For example, he argued in 1798 that Britain population would quadruple in 50 year to 28 million, but food supplies would only be sufficient for 21 million, leading to a crisis. But nothing remotely resembling this happened.

Despite numerous fallacies and failed forecasts, the ideas of Malthus continue to be exceedingly popular. WHY? The simple answer is that these theories are ALIGNED with class interests of the RICH. The POOR are to blame for their poverty. Furthermore, the rich have no responsibility to help the poor, because helping them only increases their breeding rate, leading to increased poverty, as well as increasing the stock of bad genes in human population. These deeply mistaken ideas, strongly in conflict with Islamic teachings, have had a deep and disastrous impact on human history, adding to misery of millions. They continue to guide thinking and policy of an influential minority of economists and politicians. In the next portion of this lecture, we look at the development of statistics as a tool of Eugenics, a field of study built on the foundational ideas of Malthus and Darwin.

Fractional Reserve Banking: A Central Issue

Credit creation process in the current monetary system empowers banks to benefit rich at the expense of many in society. It is the root cause of increasing inequalities and massive social deprivations. But Islamization of banking system has addressed only apparent modifications rather than the structural flaws. This note discusses the fundamental yet overlooked aspects of the system of money creation.

Banks deposits go up by 9.42% YoY/2.87% MoM in November 2019 ...

There are several complexities in the modern monetary system, but here the origins of problem are viewed in a simple context. There are two official money measures in the economy, M0 currency in circulation or high powered money M0 and the high powered money plus demand deposits at banks M1. Amount of M1 in economy is many times higher than M0. Here, it is worth noting that currently used textbooks ignore this significant difference between M0 and M1 and discuss as these are alike. Since economists do not differentiate the nature of these moneys Islamic scholars have also focused only on the legitimate status of unbacked fiat currency i.e. M0 and ignored the demand deposits. In fact, these deposits are radically different in nature and comprise of major portion of legal money in the economy. Accounting this difference, it is highly likely that bank created money (deposits) would not be a permissible currency under Islamic law. This would require reforming the entire banking system according to Islamic rules.

We consider the fractional reserve system in its original form. In 16th century people used to deposit gold to goldsmiths in exchange of paper certificates. Those certificates could be used in trade. Goldsmiths used to lend the deposits on interest without knowledge of depositors. This system is similar to the current reserve banking. In Islamic framework the status of such transactions is dubious for many reasons. First, there is element of fraud. Goldsmiths did not provide gold to borrowers but issue certificate to provide it on demand. They actually lend more than deposits and were not capable to serve all claimants. Second, interest is charged on the basis of a promise without given anything to borrower. These certificates were used for real exchange in the economy like gold. So, these banks were also creating money. Whereas, Sharia laws acknowledge the money created by a central issuing authority. Moreover, interest-based lending was essential to keep this system function.   

Modern banks keep cash (M0) as deposits instead of gold and function exactly like goldsmiths. In fractional reserve system liquidity ratio enables banks to extend loans multiple times higher than their deposits. It becomes possible because people usually do not draw large amounts of cash and interbank transfers of banks are balanced out on average. Therefore, net transfers create only short term fluctuations for any bank. But in the long run bank becomes solvent as it gets interest and loans are also paid back. In addition to this, short term liquidity issues can be solved by borrowing from other banks and central bank. Overall transactions are mutually cancelled out because banks borrow in times of shortage and lend if they are in surplus. This whole process works on the confidence and a banking crisis emerges in absence of trust. Therefore, creating confidence in people is necessary to avoid panic in reserve banking system.

Now we consider some important issues in this context. First, if it is permissible for banks to lend money that it does not own? As It is clear that banks can not meet all demands for the demand deposits it creates. Secondly, private creation of money by banks increases money supply in the economy. So, is it justified to give this power to private institutions? Banks are tempted to create more money and maximize profit through interest. Islamic banking addresses the issue of interest and provides permissible alternates to it. But it also uses depositor’s money to create risk free loans. Therefore, Islamic scholars shall not bypass these issues and must be aware of the effects and harms of current banking system. It will help us to understand that it is useless creating an Islamic version of a system that is not in beneficial for the society.  

Note: This is second in a sequence of posts based on a section of the article by Dr. Asad Zaman. For complete article and references mentioned see “On the Nature of Modern Money to read previous visit here