About Asad Zaman

BS Math MIT (1974), Ph.D. Econ Stanford (1978)] has taught at leading universities like Columbia, U. Penn., Johns Hopkins and Cal. Tech. Currently he is Vice Chancellor of Pakistan Institute of Development Economics. His textbook Statistical Foundations of Econometric Techniques (Academic Press, NY, 1996) is widely used in advanced graduate courses. His research on Islamic economics is widely cited, and has been highly influential in shaping the field. His publications in top ranked journals like Annals of Statistics, Journal of Econometrics, Econometric Theory, Journal of Labor Economics, etc. have more than a thousand citations as per Google Scholar.

Sultan Tipu: Victim of British Propaganda

An essay in imperial villain-making

 I am reprinting an old article by which illustrates how minds are shaped by narratives created by the needs of power.

By the end of the 90s, the hardliners calling for regime change in the east found that they had a powerful ally in government. This new president was not prepared to wait to be attacked: he was a new sort of conservative, aggressive in foreign policy, bitterly anti-French, and intent on turning his country into the unrivalled global power. It was best, he believed, simply to remove any hostile Muslim regime that presumed to resist the west.

There was no doubt who would be the first to be targeted: a Muslim dictator whose family had usurped power in a military coup. According to British sources, this chief of state was an “intolerant bigot”, a “furious fanatic” with a “rooted and inveterate hatred of Europeans”, who had “perpetually on his tongue the projects of jihad”. He was also deemed to be “oppressive and unjust … [a] sanguinary tyrant, [and a] perfidious negotiator”.

It was, in short, time to take out Tipu Sultan of Mysore. The president of the board of control, Henry Dundas, the minister who oversaw the East India Company, had just the man for the job. Richard Wellesley was sent out to India in 1798 as governor general with specific instructions to effect regime change in Mysore and replace Tipu with a western-backed puppet. First, however, Wellesley and Dundas had to justify to the British public a policy whose outcome had long been decided in private.

Wellesley therefore began a campaign of vilification against Tipu, portraying him as an aggressive Muslim monster who divided his time between oppressing his subjects and planning to drive the British into the sea. This essay in imperial villain-making opened the way for a lucrative conquest and the installation of a more pliable regime that would, in the words of Wellesley, allow the British to give the impression they were handing the country back to its rightful owners while in reality maintaining firm control.

It is a truth universally acknowledged that a politician in search of a war is not over-scrupulous with matters of fact. Until recently, the British propaganda offensive against Tipu has determined the way that we – and many Indians – remember him. But, as with more recent dossiers produced to justify pre-emptive military action against mineral-rich Muslim states, the evidence reveals far more about the desires of the attacker than it does about the reality of the attacked.

Recent work by scholars has succeeded in reconstructing a very different Tipu to the one-dimensional fanatic invented by Wellesley. Tipu, it is now clear, was one of the most innovative and far-sighted rulers of the pre-colonial period. He tried to warn other Indian rulers of the dangers of an increasingly arrogant and aggressive west. “Know you not the custom of the English?” he wrote in vain to the nizam of Hyderabad in 1796. “Wherever they fix their talons they contrive little by little to work themselves into the whole management of affairs.”

What really worried the British was less that Tipu was a Muslim fanatic, something strange and alien, but that he was frighteningly familiar: a modernising technocrat who used the weapons of the west against their inventors. Indeed, in many ways, he beat them at their own game: the Mysore sepoy’s flintlocks – as the examples for sale in an auction of Tipu memorabilia at Sotheby’s tomorrow demonstrate – were based on the latest French designs, and were much superior to the company’s old matchlocks.

Tipu also tried to import industrial technology through French engineers, and experimented with harnessing water-power to drive his machinery. He sent envoys to southern China to bring back silkworm eggs and established sericulture in Mysore – an innovation that still enriches the region today. More remarkably, he created what amounted to a state trading company with its own ships and factories dotted across the Gulf. British propaganda might portray Tipu as a savage barbarian, but he was something of a connoisseur, with a library of about 2,000 volumes in several languages.

Moreover, contrary to the propaganda of the British, Tipu – far from being some sort of fundamentalist – continued the Indo-Islamic tradition of syncretism. He certainly destroyed temples in Hindu states that he conquered in war, but temples lying within his domains were viewed as protected state property and generously supported with lands and gifts of money and even padshah lingams – a unique case of a Muslim sultan facilitating the Shaivite phallus veneration. When the great Sringeri temple was destroyed by a Maratha raiding party, Tipu sent funds for its rebuilding. “People who have sinned against such a holy place,” wrote a solicitous Tipu, “are sure soon to suffer the consequences of their misdeeds.”

Tipu knew what he was risking when he took on the British, but he said, “I would rather live a day as a tiger than a lifetime as a sheep.” As the objects in tomorrow’s sale show, the culture of innovation Tipu fostered in Mysore stands record to a man very different from that imagined by the Islamophobic propaganda of the British – and the startling inaccuracy of Wellesley’s “dodgy dossier” of 1799. The fanatical bigot and savage was in fact an intellectual.

The whole episode is a sobering reminder of the degree to which old-style imperialism has made a comeback under Bush and Blair. There is nothing new about the neocons. Not only are westerners again playing their old game of installing puppet regimes, propped up by western garrisons, for their own political and economic ends but, more alarmingly, the intellectual attitudes that buttressed and sustained such imperial adventures remain intact.

Despite over 25 years of assault by Edward Said and his followers, old-style Orientalism is alive and kicking, its prejudices intact, with columnists such as Mark Steyn and Andrew Sullivan in the role of the new Mills and Macaulays. Through their pens – blissfully unencumbered by any knowledge of the Muslim world – the old colonial idea of the Islamic ruler as the decadent, destructive, degenerate Oriental despot lives on and, as before, it is effortlessly projected on a credulous public by western warmongers in order to justify their own imperial projects. Dundas and Wellesley were certainly more intelligent and articulate than Bush or Rumsfeld, but they were no less cynical in their aims, nor less ruthless in the means they employed to effect them.

  • William Dalrymple is the author of White Mughals

www.williamdalrymple.com

Quantity Theory of Money: Australia

Bit.ly/dsia04g – This is Part G, 7th and final part of Lec 4 on Descriptive Statistics: An Islamic Approach. This Lecture looks at relationship between Inflation Rate and Monetary Growth on WDI Australian Data.

First we look at the data itself – This is Australian Data, taken from WDI (World Development Indicators) a World Bank data set going from 1960 to 2011 (Updated Data sets are available, going up to 2019).

The table just illustrates the nature of the data. The main idea of Descriptive Statistics is to teach students how to LOOK at the Data. For this purpose, we PLOT the key data series relevant to the QTM.

4gInfVM1

Note that the price index varies from around 10 to 110, while M1 goes from 3.5 Billion AUD to 4 Billion. To put both on the same graph, we need to re-scale M1. We did this by dividing all entries in the M1 column by the value AU$ 4.15 Billion, the value in 2010. This makes the entry for 2010 equal to 100% just like the price index, and makes the two graphs comparable. Similarly, the graph below shows the behavior of the price level as compared to M2, rescaled to equal 100% in 2010.

4GInfVM2

These pictures show that the price index and money (M1 and M2) follow rather different patterns. Milton Friedman famously said that:  “Inflation is always and everywhere a monetary phenomenon”.  The previous two graphs show that Friedman is WRONG. In the early portion of the graph, money is increasing slowly, and prices increase rapidly. In a later portion, money rises sharply but prices rise slowly in comparison. From the graph, it is clear that Inflation – rate of change of price index – is NOT solely determined by rate of change of money. The picture is ENOUGH to show that. Fancy Statistics CANNOT change this basic conclusion. Because of COMPLEXITY of conventional statistics and econometrics, students think that fancy methods might REVEAL some hidden patterns in the data.  COMPLEXITY comes from making complex UNTRUE assumptions about the data. By making such assumptions, we can make data say something NOT in the graph.

To further confirm the failure of the QTM as an economic hypothesis (not as an accounting identity), we look at how the VELOCITY behaves. This is the ratio of nominal GDP to Money, and is assumed constant or exogenous. The graph shows the behavior of V1 and V2 the velocity for M1 and M2 respectively:

4gV1aV2

The Velocity follows different Patterns for M1 and M2. In the case of M1 it increases from 5 to 10 over the period from 1960 to 1986, and then declines from 10 to 3 from 1986 to 2010. The pattern for V2 will be discussed later. This change in velocity shows that there is no stable relationship between Money and Prices – all such relationships depend on the velocity. The idea that Velocity is constant or exogenous means that this factor can be put aside in studying the relationship. But the systematic and strong patterns in Velocity show that we cannot ignore velocity in any serious analysis of relationship between M, P, and Q. Many economists wrote papers on the Breakdown of Money Demand Function in 1980s and 1990s. This corresponds to the change in behavior of V1 around 1989, from increasing to decreasing. In mathematical terms we can write the quantity equation MV = PQ as:

Log(M) + Log(V) = Log(P)+Log(Q)

By rearranging terms, we get:

Log(M) – Log(P) = Log(Q) – Log V

Log(M/P) = Real Money Demand = Log(Real GDP)– Error

In models like this, the assumption is that the error has no systematic patterns. However, it is clear that the error in the above relationship include the velocity, and this DOES have systematic patterns. This is in violation of the assumptions of the regression model.

The previous graph of V1 and V2 shows that V2 is flatter, showing less variation than V1. This might lead to the hope that the QTM will work for Broad Money, even though it fails for standard Money. To examine this idea, we look at V2, the Velocity for Broad Money on Separate Graph:

4GV2

There is an important lesson about graphs: Scale Can Create or Hide Patterns. Previously, with both V2 and V1, the Y-axis went from 0 to 12 because V1 had values over 10. Now, with only V2, the scale goes from 0 to 3 only. With this magnification, many details are much more clearly visible than they were on the previous graph. In particular, it is clear that the velocity for M2 is also unstable. From 1960 to 1988 the range of V2 is between 2 and 2.5. In this period, the assumption of constant velocity or exogenous velocity might work. After this period, V2 shows a strong and systematic decline going from 2 to 1.

What do we learn from these graphs? We can see that the Velocity behaves in a systematic and predictable way over time. We can see that Velocity is NOT constant. If Velocity is EXOGENOUS, then it MATTERS. It obviously affects the relationship between Money, Prices and Quantity. Any theory which ignores the systematic changes in velocity will be unable to explain the relationship between M, P, and Q.

Next we turn to the ACTUAL assertion of the Quantity theorists. This is that the RATE of increase in money is proportional to the rate of growth of prices. Previous graphs were about the LEVEL of prices and money. As discussed in previous lecture, we measure the growth rates as follows:

  • %P = log P(t)/P(t-1)
  • %M1 = log M1(t)/M1(t-1)
  • %M2 = log M2(t)/M2(t-1)

According to Friedman, %P is explained solely by %M1 or %M2. The graphs of these quantities below show clearly that Friedman is wrong

4GDpvDPM1

Here the %M1 (growth rate of M1) behaves very erratically compared to prices. The two series have completely different behaviors It seems clear that knowing %M1 will not help us very much in understanding how %P behaves. A similar pattern holds for %P versus %M2 – there is no relationship between the two series.

4GDPvDM2

Because standard statistical and econometric analysis is so complicated, students do not realize that “There is no MAGIC”. If the series LOOK unrelated, they ARE!

Critical Understanding: The GRAPH provides ALL the information that EXISTS. It is a COMPLETE picture of the data. THERE IS NO MORE INFORMATION AVAILABLE. A Mistaken Concept is created in the minds of students: FANCY TECHNIQUES can allow us to get MORE out of the data.  Thus, if we do Limited Information Maximum or Generalized Method of Moments, we might get some MORE information. Even if the picture reveals no relationship, we might FIND OUT that there is strong relationship by applying some complicated techniques.

It is important to understand that ALL FANCY techniques are based on ADDING INVALID COMPLEX assumptions. The ADDITIONAL inferences come from these assumptions and NOT from the data. For example, a standard ARDL (Autoregressive Distributed Lag) analysis will lead us to the conclusion that Real Money Demand is explained by Real GNP, after including sufficient lags of both variables. This is WRONG because it is based on the ASSUMPTIONS of regression model, which are guaranteed to be wrong here. This is not realized by most, because many of the assumptions of the regression can never be tested and proven wrong conclusively.

Concluding Remarks. Pictures of the data provide us with complete information about the data. However, there is an art to drawing pictures. Using the right scale is just the beginning. There are many fancy techniques, now known as “Data Visualization” – these techniques will be an important part of future statistics and econometrics. Computers enable these techniques which were impossible at the time the subject was invented in the early 20th Century. The standard methodology for statistics and econometrics substituted ASSUMPTIONS for analysis, because that was the ONLY path available at that time. Given the difficulty of doing calculations and making graphs, convenient assumptions made analysis unnecessary. This is why a radical change in methods of analysis, and in ways of thinking about data, is necessary, now that extremely difficult computations and extremely complex graphics, can be created by a click.

LINKS to Related Materials

Previous lecture covers the theoretical background required to understand the empirical data analysis in this lecture: https://bit.ly/dsia04f – Theoretical Inflation of Macroeconomics . This completes the fourth lecture of : Descriptive Statistics: Islamic Approach (link to first lecture)

For an explanation of why regression models are based on false assumptions, and how this leads to wrong and misleading data analysis, see

Choosing the Right Regressors: This lecture explains how choice of regressors of vital importance, and essential for the validity of the regression model. Standard practice does not pay attention to the fact that regression results are invalid if the regressors are not correctly chosen. https://bit.ly/azcrr

A Realist Approach to Econometrics: Lecture explains how we must go beyond the numbers to look at the real world context which generates the numbers, in order to understand what the data are trying to say. https://bit.ly/azarae

Theoretical Inflation of Macroeconomics

[bit.ly/dsia04f] – Part F of Lec 4 on Descriptive Statistics: An Islamic Approach – The 20m video is followed by 2500 word detailed explanation of the concept of Inflation in the context of the Quantity Theory of Money.

One of the CENTRAL concepts underlying this course is the idea that theory and practice cannot be separated. So far we have considered the measurement of inflation in the microeconomic context of how rising prices affect the budgets of households. In this and the next part, we consider the concept of inflation, as it arises in economic theory, from a macroeconomic perspective. As we will see, the calculations we need, and the definitions, change according to the real world context and purpose.

So far, we have considered Inflation as a measure of the effect of changing prices on budgets. In Economic Theory, the concept of Inflation is closely linked to money. The Quantity Theory of Money (QTM) formula states that MV = PT. Money times the Velocity of Circulation of Money equals the general price level times the number of transactions (sales/purchases) within the economy. We can explain this formula as follows.

We can classify monetary transactions into two types: transaction which contribute to the GDP and those which do not. In general, sales of products and services produced this year will be part of GDP, but sales of used goods (produced in previous years) or intermediate goods and raw materials, will not be part of the GDP transactions. The details of how to classify transactions are complex, and not needed for our current purposes in this lecture.  The Total amount of money USED in GDP transactions over one year can be written as:

M* = P1 x T1 + P2 x T2 + … + Pn x Tn

List all of the millions of transactions where a commodity or service is exchanged for money (in a GDP producing transaction). Then Quantity Sold x Price per unit is the amount of money which is transferred from one party to the other (used) in each transactions. If we sum ALL the transactions, we get M*, the total amount of money used to purchase GDP goods over the year. This M* is also the GDP, by definition. M* will typically be much larger than M, the total amount of money in existence, because money can be used many times over. Define V = Velocity of Money = How many times money is used, on the average (this average will include money never used as well as money used repeatedly). The equation  MV = M* can be used to DEFINE V.

Let {Q1, Q2, … , Qk} be a COMPOSITE GOOD. – this is list of all goods sold throughout the year. Note that same good would be sold in many different transactions. The list of goods Q1, …, Qk is much smaller than the number of transactions T1,…,Tn. It is convenient to introduce the Dot Product Notation: If P = (p1, p2, … , pn) is the Vector of Prices and Q = (q1, q2, … , qn) is the Vector of Quantities, the dot product of these two vectors is defined as

P ● Q = p1 x q1 + p2 x q2 + … + pn x qn

In EXCEL: SUMPRODUCT(A1:A6,B1:B6) takes the two columns A1:A6 and B1:B6, multiplies the corresponding entries and adds all of these products:

SUMPRODUCT(A1:A6,B1:B6) = A1 x B1 + A2 x B2 + A3 x B3 + A4 x B4 + A5 x B5 + A6 x B6

To explain the macroeconomic theory concept of inflation, we first review the concept of a price index. To calculate a Price Index, we first create a composite good. This is the set ALL sales of ALL goods over a one year period, called the Base Year. The value of this composite good is equal to the GDP for that year, because this is the value of all the goods produced that year. Now we fix the composite good at the base year levels, and find out how much this combination of goods would cost in other years. This will require having information about the prices of each of the commodities in each year. Price information is much easier to get than the value of the entire amount sold throughout the year.   The dot product of the prices for each year and the quantities for the base year is the price index. This is the price of the Composite Commodity in each year. We provide a simple illustrative example:

Item QTY 2010 2010 2011 2012 2013
Wheat 20 37 53 73 96
Rice 15 56 64 74 98
Milk 25 45 42 67 64
Beef 10 75 86 85 92
Price 3455 3930 5095 5910
INDEX 1.00 1.36 1.59 1.90

Suppose the Base Year is 2010. We find out the entire annual sales of ALL the commodities produced. Suppose there are only the four commodities listed – Wheat, Rice, Milk, and Beef. The first column gives the annual sale of Wheat – say 20 million Tons, Rice: 15 million tons were sold. Similarly, 25 and 10 millions tons of Milk and Beef were sold in 2010. The next column lists the price per unit in 2010. So Wheat costs 37 Thousand Rupees per ton, and so on for the other commodities. The total spending on the four commodities is the SUMPRODUCT(B2:B5,C2:C5) assuming the above numbers are in an EXCEL table starting at A1. Multiplying the QTY by the Price for each commodity and adding up all four entries give the PRICE of 3455 in the base year 2010. This is the GNP in 2010, at CURRENT prices, in Local Currency Units. Now we repeat this calculation for 2011, using the prices for 2011, but keeping the quantities fixed at the base year levels in 2010. The value of these four commodities in 2011 is SUMPRODUCT(B2:B5,D2:D5) which comes out to be 3930. This is how much the commodity bundle of 2010 would have cost in 2011. It is measure of how much the prices have changed from 2010 to 2011 – It does not measure GNP in 2011 because it has no information about the production levels in 2011. We can similarly compute the values of the base year quantities in 2012 and 2013. These numbers come out to be   3455      3930      5095      5910, in the second last row of the table marked PRICE. The units of the quantities are arbitrary, and it is only the percentage changes in these prices which matter for the index. So it is convention to set the base year value to be 100%, and write the price index in the form of percentage points. This can be done by DIVIDING all of the numbers by 3455, the base year value. This gives the index value for the four years as: 100%  136%  159%  190%. These numbers are the Price Index for the four years, and they measure the rate of increase of prices over these years.

We will now look at the data we will use to calculate inflation and assess the validity of the Quantity Theory of Money as an economic hypothesis. As an accounting identity, QTM is true by definition. These two roles of the QTM – as an accounting identity, and as an economic hypothesis, have to be differentiated clearly to avoid confusion. The WDI (Word Development Indicators) data set by the World Bank provides series for GDP Current LCU and GDP Constant LCU, and also the GDP Deflator. These are the series we need to construct the relevant macroeconomic inflation series. Below the explanation and names of the series required in the WDI data set are provided for reference:

NY.GDP.MKTP.CN  (GDP in Current LCU) GDP at purchaser’s prices is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. Data are in current local currency.

NY.GDP.MKTP.KN  (GDP in Constant LCU) The definition is the same as the previous one, except for the last sentence:  “Data are in constant local currency”. The change from current price LCU to constant price is achieved by dividing the current price series by the GDP Deflator – which is exactly the price index for production we have discussed previously. This GDP Deflator series is also in the WDI Data Base as:

NY.GDP.DEFL.ZS  (GDP Deflator) The GDP implicit deflator is the ratio of GDP in current local currency to GDP in constant local currency. The base year varies by country.

Of the three series, only two are necessary since  GDP(Current LCU) = GDP(Constant LCU) x GDP Deflator. We can get the third series from any two. The process of calculation involves creating the Current LCU GDP and the GDP Deflator (or Price Index for GDP) and then dividing the GDP Current LCU by the deflator to get the GDP Constant LCU. We illustrate these data series for just one country, Australia, in the WDI data set.

Australia Nominal GDP Real GDP GDP Deflator
CODE= NY.GDP.MKTP.CN NY.GDP.MKTP.KN NY.GDP.DEFL.ZS
2005 9.20899E+11 1.12365E+12 81.95632788
2006 9.94803E+11 1.15778E+12 85.92309612
2007 1.08306E+12 1.20156E+12 90.13759578
2008 1.17595E+12 1.2469E+12 94.30988396
2009 1.25222E+12 1.26393E+12 99.07305287
2010 1.29338E+12 1.29338E+12 100
2011 1.39906E+12 1.31803E+12 106.15

The first column is the GDP in Current LCU for Australia for the years 2005 to 2011. This is also called the Nominal GDP. The second column is the GDP in constant LCU. This is also called the real GDP. The third column is the GDP Deflator, or the price index. Note that the index is 100 in 2010, which shows that 2010 is base year for this index. Dividing the nominal GDP by the GDP Deflator (in percentages) will give us the real GDP for Australia.

In order to evaluate the QTM as an economic hypothesis, we need to partition economic Growth into two parts: Inflation and Real Growth. As we move from 2010 to 2011 both prices and quantities changed, and increased. We would like separate the increase into two parts, one due to increase in prices, and the other due to increase in quantities. Here is the data for Australia for these two years:

2010 1.29338E+12 1.29338E+12 100
2011 1.39906E+12 1.31803E+12 106.15
Growth 1.0817 1.0191 1.0615

The second column gives us the Growth in Nominal GDP = 8.17%, calculated as B2/B1-1. From the last column we get Growth in Prices = 6.15% (D2/D1-1). This is the growth rate of prices Similarly, we can calculate the growth in real GDP to be 1.91% (C2/C1-1). So we can conclude that 8.17% can be divided into two parts, 6.15% is due to price increase, and 1.91%. To avoid confusion, it is important to note that: Growth rates are Multiplicative. The nominal GDP in the Base Year is multiplied by Growth Rate of prices and by Growth rate of quantities to get the nominal GDP for the next year. This means that 1.0191 x 1.0615 = 1.0817 and not 6.15 + 1.91 = 8.06 Not 8.17.

QTM as an Economic Hypothesis states that growth rates of money affect growth rates of prices, and do not affect growth rates of the real economy. The above three series allow us to compute the growth rates of prices and the real growth rates. To see the relationship between these and money, we need a data series for money. The WDI data set has many series for money, because money can be defined in different ways. Economists talk about Narrow Money (M0), Money (M1), Money+Quasi-Money (M2), and Broad Money (M3). We will not discuss the differences in detail, and we will just choose to work with two of these definitions, to assess the QTM. The series we have chosen from the WDI are listed below, with their names and definitions as given by the World Bank:

FM.LBL.BMNY.CN  Broad Money (M3) Broad money (IFS line 35L..ZK) is the sum of currency outside banks; demand deposits other than those of the central government; the time, savings, and foreign currency deposits of resident sectors other than the central government; bank and traveler’s checks; and other securities such as certificates of deposit and commercial paper.

FM.LBL.MONY.CN (Money M2)   Money is the sum of currency outside banks and demand deposits other than those of central government. This series, frequently referred to as M1 is a narrower definition of money than M2. Data are in current local currency.

To set up for assessment of the QTM, we need to write the Quantity Equation in Terms of Growth Rates:

M(t) x V(t) = P(t) x Q(t)     and  M(t-1) x V(t-1) = P(t-1) x Q(t-1)

Take LOGs to convert to ADDITIVE form

Log M(t) + Log V(t) = Log P(t) + Log Q(t)

Log M(t-1) + Log V(t-1) = Log P(t-1) + Log Q(t-1)

Subtract the second equation from the first

{Log M(t) – Log M(t-1)} = Log M(t)/M(t-1)  is approximately growth rate of Money. We will write this as %M = Log M(t)/M(t-1). Then we can write the above equation as:

Log M(t)/M(t-1) + Log V(t)/V(t-1) = Log P(t)/P(t-1) + Log Q(t)/Q(t-1)

Growth Rates:        %M + %V = %P + %Q

This is Accounting Identity. It DEFINES V=Velocity , That is, we can calculate %V from above equation, and that will force the equation to be true.  However, when we move from Accounting Identity to Economic Theory, the theory may not hold. In the first place, the economic theory says that Velocity is nearly constant – this we can check from the data, and we will find that it is not true. Nonetheless, the theory may still be valid if Velocity is EXOGENOUS: this means that V is not affected by M or P or  Q. Learning how to find out about exogeneity is of crucial importance, but not taught in conventional statistics.

The KEY economic hypothesis is that Money affects Prices ONLY, NOT Quantity This is called Classical Dichotomy: “Money is a Veil”, or money is Neutral. It takes  TWO forms: STRONG Dichotomy Holds in short and long run. WEAK Dichotomy: holds Only in Long Run. For the Weak form of the hypothesis, short run effects of %M on %Q are allowed but these disappear in the long run. As OPPOSED to the idea of Keynes, who said that: Money is NOT neutral in short or long run.

In the NEXT lecture, we will look at the data to see  whether it supports Keynes or the Chicago school of monetarists who believe in the QTM, using data on GDP and Money from the WDI data set.

Links to related materials

This Lecture: bit.ly/dsia04f

bit.ly/dsia01e: Comprehensive Review of Islamic Approach to teaching statistics

bit.ly/dsia02e: Assigning one (index) number to evaluate multidimensional qualities involves subjective choice of factors and weights.

bit.ly/dsia03e: Histograms with Varying Bin Sizes: The effects of changing bin size on histograms for Life Expectancy, and what we learn from these pictures.

bit.ly/dsia04d: Composite Commodities: Laspayre’s and Paasche. Explains how different commodity bundles lead to different measures of inflation

bit.ly/dsia04e: Fear of Big Data: Many Inflations. Different people buy different commodity bundles and experience different inflation rates. These cannot be summarized or represented by one number.

4F: For Economic Theory, ONE number characterizes inflation experienced by the economy as a WHOLE, based on aggregate production levels in a base year.

Fear of Big Data: Many Inflations

[bit.ly/dsia04e] This is Part E of Lec 4: Descriptive Statistics: An Islamic Approach. We saw in the previous lecture that Inflation rates vary by household. The rate depends on what goods are purchases by the household. In this lecture we consider whether we can we just look at ALL of the inflation rates, instead of trying to reduce them to one number.

The mindset of statistics is a reflection of Sir Ronald Fisher’s statement that Statistics is about the reduction of data. In his era it was impossible to directly analyze large data sets, and the only hope for analysis was to reduce large data sets to a small and manageable size. This is exactly the opposite of the Modern Mindset: we would like to analyze BIG Data Sets because they contain a lot more information, and WE CAN analyze them using currently available computer capabilities.

We have seen that each commodity bundle leads to separate inflation rate. Can we reduce thousands – millions – of inflation numbers to ONE number? NO – not without serious loss of information. Conventional Statistics looks for Sufficient Statistics – a small set of numbers which summarizes ALL of the information in the data. These exist if data density follows restrictive ASSUMPTIONS. GENERALLY speaking, it was ASSUMED that all data is NORMALLY distributed. When these assumptions are approximately valid, they allow us to reduce the data and enable analysis in a pre-computer era. Typically, these assumptions are FALSE, especially in BIG data sets. Because of advances in technology, it is NOW possible to DIRECTLY analyze big data sets, WITHOUT trying to reduce them. We need to learn NEW WAYS of thinking, and NEW techniques of analysis for large data sets. In this lecture, we will explain how to  analyze Inflation for EACH household separately.

Actual analysis on real data sets can be done using surveys. For example, the Household Income and Expenditure Survey (HIES) in 2006 took a sample of 39,677 HH sample in Pakistan. Even though 40,000 is a big number, it is a “small” sample chosen as a representative of entire population of around 180 Million at that time. It is now BECOMING possible to track each individual separately – China, USA, and Europe now have the technology available to keep track of all consumer purchases over one year for each household separately. However, this data is not currently collected in this format. The data requirements for tracking annual consumption bundles are LARGE. They are much lower for PRICES which tend to remain stable over time and are the same for all individuals. That is why the Laspayre index is typically used, because you track consumption patterns only for one year, and then use that as representative for all years, while measuring price changes. Since real data of the type we need is not available, we will generate an artificial data set to illustrate the concepts. The GOAL is to understand what inflation numbers mean.

As we have seen, Inflation numbers vary by Household, and no ONE number represents this data. To illustrate this concept, consider an artificial data set for 10 households:

We consider 8 Essential Food Commodities listed above. The actual price inflation rates for each commodity goin from 2018 to 2019 are listed in the first column. Different Households purchase different amounts of these commodities. Randomly chosen amounts ranging from 5 to 95 are listed under each household. This gives us consumption bundles for 10 households. We can now compute the inflation rate for each household. Multiply inflation rates by the units of quantity and sum, and divide this by the sum of the quantities. Assuming the data is entered in an EXCEL spreadsheet starting from A1, the formula which gives inflation rate for HH 1 would be entered in cell C10 as: =SUMPRODUCT(B2:B9,C2:C9)/SUM(C2:C9). Similarly, we can compute inflation for each of the households separately. Pulse, Rice, and Milk have high inflation. So households which purchased more of these commodities would have high inflation numbers. Sugar and Oil have negative inflation, while Wheat and Chicken have the lowest positive inflation. So households have more of these four goods in their consumption bundle would experience lower inflation. For the random data chosen above, the range of inflation is from 8.08% to 11.68%. No one number can summarize this data set. For10 numbers, we can look at them directly and understand the whole data set. When the data set is larger, this is not possible. We examine this case next.

We generate Random Consumption Patterns for 200 HH & compute 200 Inflation numbers. These vary from low of 5.2% to high of 16.06%. Our minds are not built to process 200 numbers directly.  Statistics are graphical aids to translate data into visual forms which can easily be understood. A Histogram gives picture of the 200 inflation numbers:

HH200Histogram

From the histogram, we can learn a lot more about inflation for the 200 households. The Histogram Data can be given in a tabular form as follows:

5.20% 6 1
6.40% 15 3
7.60% 32 6
8.80% 51 10
10.00% 40 8
11.20% 30 6
12.40% 17 3
13.60% 8 2
14.80% 1 1

 

Each bin is represented by its lowest point. The first bin which goes from 5.2% to 6.4% has 6 HH’s. The Modal Bin is the one with the largest number of households. There are 55 HH in range 8.8% to 10.0%, so that this is the Mode for the data. We also note that the three central bins, going from 7.6% to 11.2% contain 123 HHs with is 61.5% of the total 200 HHs. In data reduction, we aim to find a smaller data set having nearly the same distribution as the larger one. One possibility is given in the third column of numbers above, which replace each 5 HH’s by one HH in the same category. The last column REDUCES data from 200 to 01 HHs having NEARLY the same distribution.

Classical methods of data reduction rely on measure of the centre of the distribution as a Representation of the data. In this context, let us examine the Mode & Mean of these 200 inflation numbers. We can calculate that the Mean = Average = 10.0%. Does this number provide a good Representation of the data: Does 10% represent the experience of many HHs? It is obvious that the answer is NO. There is a Technical Formula relates Mean to Data: Add up all the numbers and divide by the total count (200 in this case). How does this number relate to the data set? The Meaning comes from a Theoretical Assumption. If Data follow a Normal Distribution, then the average of the data is the best estimate of the mean of this normal distribution. Other interpretation can be made under different types of theoretical assumptions. These are always complex and depend on the validity of the assumptions.

The key feature which distinguishes Descriptive Statistics from Conventional Theoretical Statistics is the attempt to directly understand the data, without theoretical assumptions. The average of the data does not have a clear meaning without theoretical assumption. A more directly comprehensible central value is the MODE of the data. This is the bin containing the largest amount of data. As already indicated, the Mode is the bin [8.8%, 10.0%].The largest number of families (51/200) saw inflation in this range. This is a somewhat MISLEADING statement.

Compared to WHAT? We should ALWAYS ask this question. Here we are comparing to the OTHER BINS – but this is not mentioned in the statement, leading to possible confusions. If we want to capture the experience of most families, meaning the majority of the, the three central bins contain 123 families or 61.5% of all families. Thus we can say that most families (123/100) saw inflation rates between 7.6% to 11.2%. So if we want to REPRESENT the experience of most families, this range [7.6%,11.2%] is good. A better way to do this is to look at the inter-quartile range, discussed later in this lecture.

The median and the average attempt to provide one number as a REPRESENTATION of the data. Another objective of a Central Measure is to provide a benchmark. This is best done by the MEDIAN. For this data set, MEDIAN Inflation = 9.9%. This means that

Half of the households had inflation rates below 9.9%. The other half experienced inflation at rates above 9.9%. This 9.9% is a benchmark. HH’s which experienced inflation above 9.9% experience HIGHER (than median) inflation, those below the benchmark experience LOWER (than median) inflation. This divides the HH’s into two equal groups of LOW and HIGH. How do you compute median?

  1. SORT the data from highest to lowest.
  2. Even data count (200): Median is [100th, 101th] data points = [9.93%,9.94%]
  3. Odd data count: Middle Value exactly.

A Technical Definition if the median is that it satisfies the following two conditions:

  1. ≥ 50% of the data should be ≤ MEDIAN
  2. ≥ 50% of the data should b   ≥ MEDIAN

Concluding Remarks

  1. We have seen that the nature of Inflation is such that it is variable across HHs, and cannot be captured by ONE number. We have used histograms, modal bins, and median to try to describe some aspects of the larger data set.
  2. Use of one number leads to loss of credibility, because it does not match experience. To improve matters, we need to educate the public that inflation measure average price increase for all goods, and a few goods with spectacular price increases do not capture the average of everything. ALSO, we need to provide a range of values which captures the general experience more adequately than a single number can.
  3. Instead of using the average as a central value, the goal of REPRESENTATION can be better achieved by use of MODAL values in LARGE BINS. Large bins capture more of the data and hence do a better job of representation. Small Bins lead to MISLEADING MODES, as discussed earlier.
  4. In this context, the INTERQUARTILE RANGE provides a useful way to pick out the central half of the data. For the 200 numbers in the inflation data set, the MIDDLE HALF of the data is [8.66%, 11.41%]. This is obtained by sorting the data, and then looking at the range from the 50th to 150th HALF of the households (100/200) experiences inflation in this range. 25% saw inflation below 8.66% and 25% saw inflation above 11.41%
  5. The standard data summaries are Mean and Standard Deviation. These are useful IF Normality Assumption about data distribution holds, and USELESS otherwise. In many situations, small deviations from Normality can lead to very poor performance of the Mean as a data summary. In contrast, the MEDIAN is very robust, and works well for all data distributions.

Related Materials

  1. bit.ly/dsia01d: Islamic Pedagogical Principles: education must engage head & heart, provide useful knowledge, relate to life experience, aim to serve humanity
  2. bit.ly/dsia02d: Goodhart’s Law – a treacherous aspect of statistics is that attempts to measure something change that thing – If we use publication count to measure research productivity, people being evaluated will increase their publication counts.
  3. bit.ly/dsia03d: Histograms for World Bank Life Expectancy Data – explains the process of making histograms.
  4. bit.ly/dsia04c: Measuring inflation using Sensitive Price Index – Index depends on purpose, one purpose can be to measure inflation of essential goods.
  5. bit.ly/dsia04d: Composite Commodities: Laspayre’s and Paasche. Explains how different commodity bundles lead to different measures of inflation

 

 

 

 

 

Composite Commodities, Laspayre & Paasche Indices

BIT.LY/DSIA04D – Part D of Lec 4: Descriptive Statistics: An Islamic Approach

This lecture provides an alternative approach to computing price index. First we review the Method for computing inflation from the Previous Lecture (bit.ly/dsia04c):

Oct-18 Oct-19
Item QTY Price Price Inflation
Wheat Flour, Bag 10 kg 182.8 198.9 8.80%
Rice Basmati Broken 1 kg 37.8 44.6 18.08%
Chicken Farm,Live 1 kg 83.4 90.1 8.05%
Milk, Fresh, Unboiled 1 Ltr 30.5 35.4 16.26%
Cooking Oil, Tin, (SN) 2.5 Ltr 316.3 306.4 -3.14%
Pulse Masoor, Washed 1 kg 71.4 88.9 24.49%
Potatoes 1 kg 15.2 17.8 16.95%
Sugar, Refined 1 kg 27.9 25.5 -8.67%
Tea Prepared, Avg Hotel Cup 6.9 8.9 28.80%
Electricity (Average) Unit 4.4 5.3 21.56%
Petrol, Super 1 Ltr 57.8 55.8 -3.51%

 

We calculate Inflation for EACH good separately. Then we take a WEIGHTED average. The Weights are the PROPORTION of MONEY SPENT on the COMMODITY in the Fiscal Year 2008. These are the Aggregate Consumption numbers for Nation as a whole. Inflation is relatively easy for ONE commodity – we just measure how much the price changes. It is difficult for Multiple Commodities because price of each commodity changes in a different way. In the above table, the highest rate of inflation is 28.8% for Tea, and the lowest is -3.5% for Petrol – so how can we find ONE number to represent this entire range of price changes?

The standard method is to use a WEIGHTED Average to get to ONE number for inflation. The WEIGHTS are the MONEY SPENT on the commodity – this captures the IMPORTANCE of the commodity within budget. But there are many problems with this procedure.

PROBLEM 1: These weights keep changing with time. We used FY 2008, because data was gathered for this purpose. Change of BASE for price index is currently under way to update to 2015. But this is still behind current time patterns of consumption. How much sense does it make to use consumption patterns of 2008 to compute inflation rates in 2019?

PROBLEM 2: These are AGGREGATE weights for nation as a whole. These may not be representative of purchasing patterns of individuals, especially of the poor subgroups. This point will be made clearer in this lecture, which provides an Alternative Method for Computing Price Index and Inflation

We reproducing a table from the previous lecture (BIT.LY/DSIA04C) which shows how we compute weights for the different commodities:

 10/18 FY 2008 Composite
Item QTY Price Sales QTY
Wheat Flour, Bag 10 kg 182.8 10901.4 59.63
Rice Basmati Broken 1 kg 37.8 1903.2 50.39
Chicken Broiler, Live 1 kg 83.4 3558.8 42.68
Milk, Fresh, Unboiled 1 Ltr 30.5 16836.8 552.93
Cooking Oil, Tin 2.5 Ltr 316.3 2295.3 7.26
Pulse Masoor 1 kg 71.4 489.2 6.85
Potatoes 1 kg 15.2 1250.1 82.14
Sugar, Refined 1 kg 27.9 2734.0 97.92
Tea Avg Hotel Cup 6.9 791.3 114.52
Electricity (Avg) Unit 4.4 11512.8 2640.55
Petrol, Super 1 Ltr 57.8 5118.8 88.51

 

The weights are computed from the BUDGET shares, the amount of money spent on each commodity. We have data on the Price of these commodities on 25 Oct 2018. Table above lists (hypothetical) MONEY SPENT on total sales of commodity in FY 2008, this time in Thousands of PKR – Previous Table was in Millions of PKR.  Since Price x QTY = Money Spent, we can calculate TOTAL QTY Purchased.This is given in Last Column. The Quantity Purchased is obtained by dividing the Money Spent by the Price per unit. The last column can be thought of as a COMPOSITE GOOD. If we think of the ENTIRE nation as ONE household, than the data for 2008 shows that this nation purchased 59.63 thousand 10 Kg bags of Wheat, 50.39 thousand Kg of Rice, and so on. If we DEFINE consumption to be this FIXED basket of goods purchased in exactly these proportions, this basket of goods is called a composite good. Then, the Price Index is just the Price of the Composite Good. Also, Inflation just measures Increases in Price of CG. By creating a composite good, we have assembled all the different goods into one package, which allows us to measure inflation by pricing this package across time.

The standard method for creating a composite good is to use commodity bundles purchased by the whole nation for a fixed Base Year. Until recently, Pakistan Bureau of Statistics was using FY 2008 as the base year, and now it has recently been changed to 2015. This method is called the Laspayre’s Price Index. This method fixes consumption pattern, and composite good, in some past year. The advantage of this method is that we only need to gather detailed micro-data on patterns of consumption for ONE year. After that, the pattern of consumption remains fixed, avoiding the need for costly survey to find current consumption patterns. The disadvantage is that the weights are not representative of current proportions of the good in consumption bundle today. We are using consumption patterns of 2008 to calculate inflation in 2018. A superior alternative is the Paasche Index. This method uses the Composite Good based on CURRENT consumption patterns. This is rarely used in practice because the information on current consumption bundles required to compute this index is rarely available.

We will now illustrate all of these ideas by a very specific, simple, and artificial example. A Price Index is just the Price of a Composite Good. A Composite Good is just A bundle of commodities, regarded as ONE good. To illustrate this consider FOOD as a composite good, which is made up of different quantities of Wheat, Rice, Milk, and Chicken.  Table below uses 30 Kg of Wheat, 10 kg of Rice, 50 Liters of Milk, and 10 Kg of Chicken as the composite good FOOD:

  Qty Price

2018

Budget

2018

Price

2019

Budget

2019

Wheat 30 180 5400 198 5940
Rice 10 37 370 45 450
Milk 50 30 1500 35 1750
Chicken 10 80 800 88 880
8070 9020

 

Using 2018 price, we calculate the amount spent on each commodity, and add them up to get the total budget required to purchase FOOD (the composite commodity). This is 8070 PKR in 2018 and increases to 9020 in 2019. Then we can calculate the inflation rate as 11.77% = (9020/8070 – 1).

While this gives a clear answer to the question of how to compute inflation, there remains the question of ‘ How to choose the FOOD bundle?’ What are the quantities we should use for each of these commodities? As we will show this choice matters a lot in computing the price index and the inflation rate. The Conventional Choice is the Laspayres Index. That is, we fix ONE YEAR as the BASE for the price index. We measure AGGREGATE CONSUMPTION of the commodities in our bundle for the entire nation. Instead of treating the entire nation as ONE household, a more sensible alternative is to look at each household separately.  This leads to much greater insight regarding the meaning of price index and inflation, as we now show.

The table below considers the consumption patterns of one (hypothetical) household in the two years 2018 and 2019:

Q 2018 P 2018 B 2018 Q 2019 P 2019 B 2019
Wheat 365 180 65700 400 198 79200
Rice 220 37 8140 200 45 9000
Milk 450 30 13500 550 35 19250
Chicken 120 80 9600 150 88 13200
Pulse 300 20 6000 250 40 10000
102940 130650

 

For this Household, the first column measures the Quantity for each of the FIVE goods, Wheat, Rice, Milk, Chicken, and Pulse, that the household ACTUALLY purchased over the entire year 2018. The second column gives the PRICES at which these commodities were purchase. Of course this price fluctuates over the year. For the purpose of computing the budget, we actually need the average price at which the household purchased this goods over the entire year. Let us assume that the Oct 2018 price we have is representative of this average price. Then the third column gives us the FOOD budget of this household in 2018. Now these same numbers are replicated for the year 2019 in the last three columns. The purchasing pattern of the household changes, with increases in Wheat, Milk, and Chicken purchases, and decreases in Rice, and Pulse. The budget spent on food increases to 130,650 from 102,940. Does this increase of 26.9% (=130,650/102,940 – 1) represent INFLATION? NO – because the Composite Commodity CHANGED.We must keep Quantities FIXED to measure inflation. Part of the budget increase comes from increases (or changes) in purchased QUANTITIES. Another part comes from changes in prices. The figures above MIX both of these effects, due to price change and due to quantity change. We must keep Quantity FIXED in order to measure the effect of price change only – the inflation – on the budget. We haveTWO CHOICES. Either we can keep quantities fixed at the PREVIOUS year 2018 levels – the Laspayre method. Or we can use the Paasche method, which keep quantities fixed at the CURRENT YEAR numbers. We now illustrate both of these methods.

The Laspayre method uses Previous Year Prices as Base. The actual quantities purchased in 2019 are replaced by the previous year quantities, highlighted in red. This change in quantities leads to a change in the BUDGET for 2019, which is also shown in red.

Q 2018 P 2018 B 2018 Q 2018 P 2019 B 2019
Wheat 365 180 65700 365 198 72270
Rice 220 37 8140 220 45 9900
Milk 450 30 13500 450 35 15750
Chicken 120 80 9600 120 88 10560
Pulse 300 20 6000 300 40 12000
102940 120480

 

If the household did not change its consumption pattern, and purchased exactly the same goods in 2019 that it did in 2018, than its budget would have been 120,480. Now, we can calculate inflation as 120480/102940-1 = 17.04%. This is the Correct measure of inflation for THIS Household, using Laspayre’s Index. Note that for a correct computation of the budget, PRICES should be averaged over the entire year, NOT fixed at one point in time like Oct 2019. This is a minor issue which we will ignore in present lecture.

The alternative to Laspayre is to use Paasche, which uses the CURRENT year as base. This calculation is shown below:

 

Q 2019 P 2018 B 2018 Q 2019 P 2019 B 2019
Wheat 400 180 72000 400 198 79200
Rice 200 37 7400 200 45 9000
Milk 550 30 16500 550 35 19250
Chicken 150 80 12000 150 88 13200
Pulse 250 20 5000 250 40 10000
112900 130650

 

Instead of using the actual quantities purchased by this household last year, we use the CURRENT year purchases, and price them at last year prices. The first column of Q 2018 (actual purchases in 2018) is replaced by Q 2019, the actual purchases in 2019, as shown highlighted in red. This leads to a change in the BUDGET for 2018, again highlighted in red. We can now compute the Paasche measure of inflation as 130650/112900 – 1 = 15.72%. This is rather different from the Laspayre measure of 17.04% computed earlier.

There are many OTHER options. Instead ot taking either of the two years as the base, we could take average amount of purchases in both years, OR just average Paasche and Laspayre index. The important question is: “Is there a CORRECT measure of inflation?:” The answer is NO: We should think of inflation as a QUALITATIVE phenomenon, which are are trying to measure imperfectly with numbers. This means that A RANGE of numbers can be suitable, and no one measure can adequately describe inflation.

To see how the choice of commodity bundle matters, we will show that different households can have VERY DIFFERENT measures of inflation. Here we compare two different household. One of them eats only Rice and Pulses R&P (Dal & Chawal in URDU). We look at the inflation rate using the budget of this household in the table below:

R&P P 2018 B 2018 R&P P 2019 B 2019
Wheat 0 180 0 0 198 0
Rice 500 37 18500 500 45 22500
Milk 0 30 0 0 35 0
Chicken 0 80 0 0 88 0
Pulse 500 20 10000 500 40 20000
28500 42500

 

For this R&P household, the Inflation rate is very high at 49.1%. This is because the price of pulse (Daal) has doubled from 20 PKR to 40 PKR, and this household spends a lot of money on Pulses. So its budget is very badly affected by inflation, going from 28,500 to 43,500 for nearly 50% inflation rate.

Another household has very different consumption patterns, eating only Wheat, Milk, and Chicken – WMC:

WMC P 2018 B 2018 WMC P 2019 B 2019
Wheat 500 180 90000 500 198 99000
Rice 0 37 0 0 45 0
Milk 500 30 15000 500 35 17500
Chicken 500 80 40000 500 88 44000
Pulse 0 20 0 0 40 0
145000 160500

 

Because the prices of these commodities did not rise by too much, the budget of the WMC household increases only by 10.7%, going form 145,000 to 160,500.

Since different households will face different inflation rates, depending on their purchase patterns, we can ask what is the range of variation for this inflation rate? How much can it vary? To answer this question, we must look at the inflation rate for each commodity separately, as in the table below

 

P 2018 P 2019 Inflation
Wheat 180 198 10.0%
Rice 37 45 21.6%
Milk 30 35 16.7%
Chicken 80 88 10.0%
Pulse 20 40 100.0%

 

This shows the inflation rates for each of the five commodities in the food bundle separately. The range of inflation rates is between 10% and 100%. A Weighted Average can vary between max 100% and min 10%. If  a household purchases a lot of Pulse, then it will see inflation near 100%. If another household purchases mostly Wheat and Chicken, the it will see the minimum possible Inflation of 10%. ALL combinations of weights will come out BETWEEN these two numbers. This is because a weighted average is always within the range of numbers which are being averaged.

Concluding Remarks

  1. Inflation varies by Household, according to their purchase patterns.
  2. Purchase pattern changes with time, so no single number measures inflation.
  3. Making ARBITRARY conventions allows us to ASSIGN a number, but this number is not an ACCURATE measure of inflation. It should be taken as an INDICATOR of a qualitative characteristics. A range of numbers may be suitable to describe inflation, even for a single household.
  4. No single number describes inflation for the country as a whole. The idea that taking the AGGREGATE consumption bundle in 2008 as the composite commodity leads to accurate measures is WRONG. This involves reducing the country to one household and ignoring diversity in consumption patterns across different income groups as well as regional groups.
  5. It is possible to get MORE accurate measures by STRATIFYING the population according to consumption patterns. Then we can assign different (approximate) inflation numbers to different subgroups of the population.
  6. The variations in consumption patterns are also ENDOGENOUS and CAN BE CHANGED by campaigns. Endogenous means that prices affect these patterns – if something becomes expensive, less use will be made of it. Also, tastes can be changed by many different factors, leading changes in consumption patterns. Such changes can also affect the inflation rate.

Links to Related Materials

  1. This lecture: bit.ly/dsia04d
  2. Lecture 1c: ly/dsia01c – Difference between Western and Islamic Conceptions of Knowledge.
  3. Lecture 2c: What do College Rankings Means – bit.ly/dsia02c – Shows that factors and weights are chosen arbitrarily, and by changing our subjective evaluation criteria, we can radically change college rankings.
  4. Lecture 3b: Computing Life Expectancies from Mortality Tables. bit.ly/dsia03b Lecture shows how mortality tables are used to compute life expectancies, assuming mortality rates in each group will remain the same in future.
  5. LY/DSIA04A: Recapitulates why an Islamic Approach is necessary.
  6. LY/DSIA04B:Data Reduction – When we summarize a lot of data by ONE number, we lose a lot of information.
  7. LY/DSIA04C: Explains how we use consumption weights to aggregate different inflation rates for each commodity into ONE measure of inflation.

 

 

 

Calculating a Price Index & Inflation

BIT.LY/DSIA04C – Part C of Lecture 4-Descriptive Statistics: An Islamic Approach Details of How We Compute A Price Index, and How We use it to Compute Inflation Rates. The 17m video is followed by a 2200 word summary of the lecture.

Before turning to details of how a price index is computed, we discuss some general principles involved in the acquisition of knowledge. There is an important distinction between the Analytic Versus Synthetic approach.

  1. Analytic Approach: Break everything into small parts, and study them separately. Part of a divide and conquer strategy towards knowledge.
  2. Synthetic approach: put all the separate pieces together to study them as a whole. Part of a holistic and integrated strategy for achieving global understanding.

In general, we need both approaches for acquiring a good understanding. We need to understand the pieces separately, and also how they work together. In general, the Western intellectual tradition is very strong on the analytical approach, and very weak on the synthetic. The most important loss from this weakness occurs in the context of EMERGENT properties. These are properties of the system as a whole which cannot be understood from studying the parts of the system in isolation. For example, studying properties of heart cells in isolation cannot lead to an understanding of HOW, and much less WHY, the heart pumps blood. The WHOLE is greater than the sum of its parts. The standard Approach to Statistics is isolationist and analytical. Islamic approach is HOLISTIC and synthetic, and starts with consideration of PURPOSE.

A price index reduces a large number of prices to ONE price. This is one type of data reduction. All Data Reduction HIGHLIGHTS something and IGNORES many things. To understand data reduction, we must understand “What is being highlighted?” and also “What is being ignored?”. We must understand the PURPOSE of data reduction, in order to be able to  highlight the right elements, and to understand what can safely be ignored.

The Sensitive Price Index (SPI) is meant to study how expensive it is for the masses to buy necessities. Increases in this index reflect increased difficulty in buying essentials. This general understanding is of great importance in deciding upon many details which come up in constructing the index.  CLEARLY, one should pay attention to public provision of necessities – like social welfare programs, government hospitals, government provision of education. ALSO, we should pay attention to availability of non=market goods, like self-grown food (in rural areas). All of these factors matter in terms of how much money is needed by the masses to enable them to buy essentials. However, these factors would not be considered in a standard statistics course. Going outside study of numbers strongly discouraged by conventional approach, which is based on an analytical approach. The statisticians job is confined to the analysis of numbers, and not to the bigger real world context from which the numbers come.

After these preliminary remarks, we turn to the technical details of calculation of a price index. The first Step involves choosing a Bundle of Goods, which are “representative”. In real life, I was part of a committee of experts at the Pakistan Bureau of Statistics (PBS), which was called to decide on goods to be put in the new commodities basket for 2015. We had an initial list prepared by the PBS and subjectively modified it by adding and subtracting commodities the member felt were important. The larger the collection of commodities, the more representative they would be. But also, the process of gathering information on prices, and also on expenditure, would be more costly and time-consuming. So it was important to pick a small bundle, which would also be representative.

Turning to the specific details of the SPI, we note that there are 53 goods in the basket for the SPI. These goods are listed below:

Looking at the list shows that goods reflect commodities of importance to the poor in Pakistan. But does not, and cannot, reflect regional and seasonal variations, and non-market effects. For example, firewood is important for heating in lives of the poor in cold regions, but is not sold in normal marketplaces. People in rural areas can grow some of their own food. Most importantly, rent/housing is not included, even though it is a major component of cost of living. This is because of CONVENTIONS regarding the Consumer Price Index in Pakistan. In other places, rent is included. For example, COLA is short for Cost of Living Adjustments made to retirement incomes in the USA, and this includes rental prices.

Coming back to the details of calculations. The first step, choosing representative commodities for the PURPOSE of the index, has already been discussed. The second step is to Get Prices of Chosen Commodities. We illustrate this by looking at a subset of 11 commodities chosen from among the 53 in the SPI list:

25/10/18
Item QTY Price
Wheat Flour, Bag 10 kg 182.8
Rice Basmati Broken, (AQ) 1 kg 37.8
Chicken Farm, Broiler, Live 1 kg 83.4
Milk, Fresh, Unboiled 1 Ltr 30.5
Cooking Oil, Tin, (SN) 2.5 Ltr 316.3
Pulse Masoor, Washed 1 kg 71.4
Potatoes 1 kg 15.2
Sugar, Refined 1 kg 27.9
Tea Prepared, Average Hotel Cup 6.9
Electricity Charges (Average) Unit 4.4

 

Why did we choose 10Kg for Wheat, and 1Kg for Rice, 1 cup of Tea? The UNITS chosen are ARBITRARY. We will price the SAME bundle across time and look at changes. It is only the percentage change in prices which matters. So we choose UNITS on the basis of what is convenient and easy to price in the market. In general, there are many prices for the same good, so we need to make conventions about WHAT counts as THE price, and how to handle fluctuations in prices. It does not matter how we do this, as long as we choose a systematic method which remains the same over a long period of time.

Step 3: Recompute Prices after an interval. We then find out the prices of the SAME bundle of goods, using the SAME methodology for finding prices after some time. Generally this is done on a monthly or weekly basis. Below we give data over a one year interval.

Oct-18 Oct-19
Item QTY Price Price
Wheat Flour, Bag 10 kg 182.8 198.9
Rice Basmati Broken 1 kg 37.8 44.6
Chicken Farm, Broiler, Live 1 kg 83.4 90.1
Milk, Fresh, Unboiled 1 Ltr 30.5 35.4
Cooking Oil, Tin, (SN) 2.5 Ltr 316.3 306.4
Pulse Masoor, Washed 1 kg 71.4 88.9
Potatoes 1 kg 15.2 17.8
Sugar, Refined 1 kg 27.9 25.5
Tea Prepared, Average Hotel Cup 6.9 8.9
Electricity Charges (Average) Unit 4.4 5.3
Petrol, Super 1 Ltr 57.8 55.8

 

The table gives the prices of the same unit on 25 October 2018 and on 25 Oct 2019, one year later. The changes in the price measure the “inflation” in that good. Note that their was disinflation, reduction in prices, in sugar and in Petrol, for reasons known to consumers in Pakistan.

Step 4: Compute Inflation in each category separately. Given the two prices, separated by one year of time, we can compute the annual inflation rate for each commodity separately. This is done in the table below:

Oct-18 Oct-19
Item QTY Price Price Inflation
Wheat Flour, Bag 10 kg 182.8 198.9 8.80%
Rice Basmati Broken 1 kg 37.8 44.6 18.08%
Chicken Farm, Broiler, Live 1 kg 83.4 90.1 8.05%
Milk, Fresh, Unboiled 1 Ltr 30.5 35.4 16.26%
Cooking Oil, Tin, (SN) 2.5 Ltr 316.3 306.4 -3.14%
Pulse Masoor, Washed 1 kg 71.4 88.9 24.49%
Potatoes 1 kg 15.2 17.8 16.95%
Sugar, Refined 1 kg 27.9 25.5 -8.67%
Tea Prepared, Average Hotel Cup 6.9 8.9 28.80%
Electricity Charges (Average) Unit 4.4 5.3 21.56%
Petrol, Super 1 Ltr 57.8 55.8 -3.51%

 

How do we compute this inflation number? The standard formula for Inflation is: Inflation = (Current Price – Previous Price)/Previous Price. This is also equal to the ratio of current price to previous price minus 1. This means the BASE is Oct 2018 Price. This is a CONVENTION. We can also make the base equal to the current price, or the average price over the two years, or many other possibilities. As long as we stick to one method, it usually does not matter very much.

Step 5: Assign WEIGHTS to each commodity. It does not make sense to take the simple average of all of the 11 inflation rates, and call it the overall inflation. This is because not all the commodities are equally important. How can we assess the relative importance of the different commodities. Actually, there are many different sensible ways of doing this. The one we choose is not because it is the best. Rather, the method chosen is the most convenient to use. PBS uses the volume of TOTAL SALES of the commodity in Fiscal Year 2007-8 as the weight for that commodity. Ideally, the total sales in the current year should be used, But data on this is not easily available, and is expensive to gather. That is why BASE of price index is calculated once, and then changed after long intervals. Currently, PBS is in process of shifting the base of the price index from 2007-8 to 2015. This involves calculating the weights for all of the commodities in the price index, by calculating the total volume of sales for that commodity in Fiscal Year 2015.

The table below gives ARTIFICIAL numbers for total sales for the chosen 11 SPI commodities. To illustrate the process of calculating weights.  We can take these numbers as sales in Millions of PKR.

FY 2007
Item Inflation Sales Percent Product
Wheat Flour, Bag 8.80% 10.9 18.99% 1.67%
Rice Basmati Broken 18.08% 1.9 3.32% 0.60%
Chicken Broiler, Live 8.05% 3.6 6.20% 0.50%
Milk, Fresh, Unboiled 16.26% 16.8 29.34% 4.77%
Cooking Oil, Tin, (SN) -3.14% 2.3 4.00% -0.13%
Pulse Masoor, Washed 24.49% 0.5 0.85% 0.21%
Potatoes 16.95% 1.3 2.18% 0.37%
Sugar, Refined -8.67% 2.7 4.76% -0.41%
Tea Prep, Avg Hotel 28.80% 0.8 1.38% 0.40%
Electricity Avg 21.56% 11.5 20.06% 4.32%
Petrol, Super -3.51% 5.1 8.92% -0.31%
Total= 57.4 100% 11.99%

 

The third column of numbers gives the percentage of sales, as a proportion of the total sales of all 11 commodities. Sales of Wheat flour were 10.9 Million PKR, while Total Sales for all 11 commodities were 57.4 Million PKR. So the proportion of Wheat was 10.9/57.4 = 18.99%. Similarly, we can calculate the percentage weight of each of the commodities by looking at how much money was spend on that commodity, as a percentage of the total expenditure.

Step 6: Compute WEIGHTED Average of Inflation rates in each category. This consists of two steps. The first step is to Multiply Inflation for each commodity by the PROPORTION of the COMMODITY in the OVERALL Budget for the Nation. This is listed in the last column of numbers above. For Wheat, we have 8.8% Inflation multiplied by the 18.99% Share gives a 1.67% contribution to inflation. The inflation rate for each of the commodities is multiplied by its percentage weight to get the contribution. The largest contribution to inflation comes from 4.77% in Milk and 4.33% in Electricity. At the same time, Sugar and Electricity contribute negatively, and actually bring down the inflation. ADD up all of the shares to get total inflation of 11.99% in the SPI. This completes the technical details of how inflation is computed from the SPI – the sensitive price index. This number is supposed to measure increases in Cost of Living for the Poor, in terms of essential commodities. But is does not include expenses on rent, education, health, which are actually very important. So it is a very imperfect index, and there is substantial room for creating better measures of how expensive life is for the poor. We will discuss some alternatives in later lectures.

Concluding Remarks: It seems that numbers are objective, but our goal has been to show that there is a lot of subjectivity in the construction of the index numbers. In particular, the following aspects have some subjectivity involves:

  1. Choice of Commodities has SOME flexibility – not arbitrary, but not fixed
  2. Finding the Prices has some flexibility
  3. Assigning Weights is really important, and is NOT done well. We are using 2007-8 weights, which may have changed a lot.

I have NOT explained the method of calculation in the standard way. I just chose the easiest way to understand. OTHER methods for same calculation will be discussed later.

Links to Related Materials

  1. Lecture 1a: Why an Islamic Approach to Statistics? bit.ly/dsia01a –Islamic approach requires consideration of PURPOSE of study, and how this relates to the PURPOSE of our lives. Relate Knowledge to Life Experience.
  2. Lecture 2a: Comparing Numbers – bit.ly/dsia02a – Qualitative and Subjective comparisons are made to appear objective by assigning numerical measures to intelligence, quality, wealth. Lecture shows how many arbitrary subjective judgements are involved in the process of assigning numbers to unmeasurables.
  3. Lecture 3a: Life Expectancies. bit.ly/dsia03a Lecture analyzes Life Expectancies in the World Bank Data Set using histograms. Some basic facts about how these have evolved over time, and how the data shows these changes are discussed.
  4. LY/DSIA04A: Recapitulates why an Islamic Approach is necessary.
  5. LY/DSIA04B: General Discussion of How and Why we REDUCE data, using ONE number to represent a large data set. One Inflation number represents many different price changes throughout the economy. WHY?

Inflation: Reducing Data to ONE Number

[bit.ly/dsia04b] – Part B of Lec 4 in Descriptive Statistics, An Islamic Approach. This topic is generally discussed under the heading of “Measures of Central Tendency”. The idea is to represent the entire data set by just one number. Typically this number will be a ‘central’ number within the data set. There are many different ways to formalize the concept of “central”. We reject this approach, because this type of data reduction cannot be done without knowing the purpose of this reduction. To learn the purpose, we need to go beyond the data set, to the real world problem we are trying to solve, with the help of the data set. We will illustrate this issue by discussing “inflation”, which is one number that represents a lot of different price changes. The 20m video is followed by a detailed discussion and explanation.

The Traditional Approach to data reduction was created by Sir Ronald Aylmer Fisher, known as the father of modern statistics. He assumes that the data are generated from “nice” theoretical density, like the normal density. If this is true, then the best use of the data is to discover the density. It is possible to show that the Data = Theoretical Density Plus Random Errors (Noise). We can reduce data by eliminating noise. Under convenient ASSUMPTIONS about underlying density which generates the data, Fisher showed that we can find SUFFICIENT STATISTICS – one (or more) numbers which provide ALL the information that data has about the underlying density. This provides a theoretical justification for taking a large data set and reducing it to just a few numbers.

However, WHAT if data is NOT generated by underlying density? Then it cannot be reduced to one number, and we must deal with the entire data set. What is the JUSTIFICATION for making the assumptions which allow us to reduce the data to sufficient statistics? The ONLY justification is that it allows a theoretically valid reduction. We can NEVER prove that the assumptions regarding nice underlying density are valid. In the pre-computer era, it was convenient to make assumptions which allowed reduction of the data, but there is no longer any need to make fairy-tale assumptions just for convenience, so that we do not have to deal with large amounts of data.

What is the ALTERNATIVE to making assumptions? This involves Narratives – using data to tell coherent, explanatory, causal, stories. We will discuss how this process works in this course.

An Islamic Approach involves looking through the appearances to recognize the underlying reality. In particular, we want go beyond the numbers to the real world which is being measured by these numbers. In the context of data reduction, the questions of importance are:

  1. WHY do we want one number to describe the entire data set?
  2. Assuming this is needed, HOW can we compute this number?
  3. WHAT does this number mean – with reference to data, AND with reference to the real world?

TO elaborate the question 3 above, REAL WORLD generates  DATA which is reduced to One Number. What is the relationship of the one number to the real world?

  1. What will the effect of this data reduction, replacing the whole data set with one number?

Before explaining the answers to these questions in the context of inflation, we explain a general Pedagogical Principle: Understand Abstraction by reduction to CONCRETE. When we have Theorems, Principles, Philosophies, we can only understand them by asking: HOW do these apply in simple real world examples?  It is necessary to take abstractions and understand how they work in the real world, in particular special and SIMPLE examples.By understanding SEVERAL such applications, one gets an idea about how the general principle works.  In this lecture, we study INFLATION – ONE number to represent hundreds of changes in prices. What does it MEAN to talk about the general level of inflation, when prices of different things changes in different ways. To understand this general question, we look at a specific example on a real data set. There are twelve general categories of goods which are used to calculate the inflation numbers. These are listed below.

A basket of goods representing average household expenditure on that category is priced in May 2019. For example, the chosen food basket costs PKR 118.58 in 2019. The same basket is priced in May 2020 and now costs PKR 131.55. This is an increase of 10.9%, which is the inflation rate for the food category. Note that by changing the composition of the food basket we could change this rate. The same kind of calculation is done for each category of goods, so that we get 12 different rates of inflation, one for each category. These rates are listed in the last column. The highest rate is 19% for Alcoholic Beverages and Tobacco, while the lowest rate is -6.3% for Transportation. This negative rate is due to the unusual behavior of oil prices, which decline over this period of time. To get a SINGLE number, we could just average these 12 numbers. If we do this, we get 6.9%. However, it does not seem reasonable to take a simple average, because some of these categories are far more important than others. Using weights we will discuss in much greater detail later, we find the Rate of Inflation is 7.3%  Year-on-Year (YoY) basis from May 2019 to May 2020. We could also look at MoM Month-on-Month inflation by comparing price changes on a monthly basis. There are many different ways to come up with an inflation number. Before we can proceed further, we must ask:  What does this number (7.3%) MEAN? In order to understand meaning, we must look at the following questions:

  1. How is this 7.3% computed from this table? As we will see, this is based on a choice of weights, which is somewhat arbitrary, but not entirely so.
  2. Is it REASONABLE to even TRY to reduce such a diverse collection of prices changes to ONE Number? What is the compulsion to do such a reduction? Why cannot we just use 12 different rates of inflation, one for each group?
  3. What is the EFFECT of making such a reduction?

As we will see, even though there are many arbitrary decision which go into the calculation, this number (7.3%) is called the HEADLINE Inflation! This number enters into many different decisions of economic policy. Before crunching numbers, we need to know WHY?  WHY do we want ONE number to represent/summarize all of these movements?  What are the EFFECTS of reducing complexity and diversity, and condensing all price variation into a single number: the headline inflation?

Answers to WHY questions have many dimensions. One dimension is the history: why did people start calculating these numbers? Why did people think we could find one number which would represent the general price level and the general rate of increase in prices? To answer this, we need to discuss some Economic Theory.  The Quantity Theory of Money (QTM): Suppose we add a ZERO to all notes.  Quantity of Money is now multiplied by 10. ALSO, all prices are multiplied by 10. But, there is no change in real economy.

QTM: Suppose MONEY STOCK grows by 15% and there are no REAL changes in the economy. THEN all prices will increase by 15%, so rate of inflation will be 15%.

If all prices rise roughly in the same proportion, then inflation makes sense. If inflation is occurring due to increasing quantities of money, then we would expect to see this pattern. However, our data set shows that this is NOT the case. Even though the rate of inflation is calculated to be 7.3%, inflation in different categories of goods ranges from a high of 19% to a low of -6%. So the question arises:

Can we use ONE number (7.3%) to REPRESENT, or STAND for, many numbers – the whole set of inflation numbers from 19% to -6% across the different categories of goods?  Again, this depends on the PURPOSE for which we are doing representation.

Inflation numbers serve many purposes, among which THREE BROAD Purposes can be identified as follows:

  1. Evaluating Government Performance (also SBP, Treasury, Monetary Policy). Governments attempt to provide stability of prices, and keeping inflation low is a policy target.
  2. Making Monetary Policy DECISIONS: The inflation number is very important in deciding upon appropriate monetary policy. The State Bank of Pakistan (and all Central Banks) look at a broad range of inflation forecasts, and use them in MODELS of monetary policy
  3. Public plans for future depend on inflation forecasts. Consumers worry about inflation because they need to use money to purchase goods. Business Sector needs to adjust wages of laborers to keep in line with inflation, and to set prices so that they cover rising costs due to inflation. The financial sector must set interest rates to cover inflationary costs. For all of these purposes, inflation numbers are needed. But can one number be suitable for all, or would different numbers be useful for different purposes?

In fact, many different inflation numbers are computed for different purposes. There are Consumer Price Indices, Wholesale Prices for Business, Core Inflation numbers, and many other types. In the remaining lecture, we will discuss ONE of these numbers in greater detail: the Sensitive Price Index or SPI. What is the purpose of the SPI? Actually there are many:

  1. Judge impact of inflation on the general public. In Pakistan, the masses are poor and need to buy essential goods, so inflation in these goods affects their lives more.
  2. Big changes in SPI lead to Political Unrest, loss of Political Support for the party in power, and have impact on Election outcomes.
  3. Labor demand for wages, and salary increases of Government Servants require consideration of inflation in essential goods.
  4. Many types of Social Service programs provide money to the poor. The question of “How much money should be provided” requires knowing the prices of essential goods.
  5. If we want to know the level of income required to be able to purchase basic needs, we need to know about the SPI.
  6. Where to draw the poverty line, such that those below the line are counted as poor and eligible for support? There are many possible uses and purposes for a poverty line, and these also impact on how we should compute the SPI numbers.

How can we adjust the SPI to take into account these different concerns? There are two main areas where we have some flexibility. We can make a CHOICE of goods, and also a CHOICE of weights. To see how this works, let us look at  the Actual Data for the SPI in Pakistan.

Sensitive Price Index Data

The quantity unit chosen is arbitrary – for example, we use 10Kg for Wheat and 1Kg for Rice – it does not really matter which unit we chose, because we will compare it with the prices of the SAME unit later to compute inflation. So units are chosen for convenience as amounts readily available in the marketplace and therefore easy to price. By looking at the list of 53 goods chosen, the student can get an idea of why this is called the “Sensitive Price Index”. The goods are the ones that are important in the lives of the masses of public. Next we provide a first round explanation of how the SPI is computed. Instead of the full 53 goods, we simply to an artificial example based on a small subject of these goods, to make it easier to understand.

From the 53 Sensitive Commodities, we have chosen 11 to illustrate how the SPI is computed. AFTER the commodities have been chosen, the main problem is how to assign weights to each commodity. One way to do this is to look at how much money is spent on each commodity. The third column gives the sales in millions of rupees for each of these 11 commodities (this is hypothetical, not actual data). These sales provide an indication of how important this commodity is, in the budget of the consumers. We take the ratio of the amount spend on the commodity, as a proportion of the total amount spend on all 11 commodities, as the WEIGHT for the commodity. Once the weight is decided, it becomes easy to compute the SPI and the rate of inflation of the SPI. The 2nd column give the price on 25 Oct 2018 for each of these 11 commodities. Averaging these prices using the weights given in the 5th column of numbers gives the SPI in Oct 2018. We can now get the prices of these same 11 commodities on 25 OCt 2019 to compute the annual inflation rate.  For each of these commodities, the inflation rate is listed in the last column. These inflation rates vary from the high of 29% for tea to the low of -3.5% for Petrol. If we AVERAGE all of these inflation rates after applying the WEIGHTS given in the 5th Column, then we get the inflation rate of the SPI as 12%. There are many other ways to do this same calculation, as we shall discuss later.

POSTSCRIPT: Full Data Set for 13 Months is embedded below for reference – not actually used above

Inflation Data for Urban Prices

Links to Related Materials

Lecture 1a: Why an Islamic Approach to Statistics? bit.ly/dsia01a – Parts a,b,c,d,e,f, explain why it is necessary to take an Islamic approach to an apparently objective and factual subject like statistics.

Lecture 2a: Comparing Numbers – bit.ly/dsia02a – Parts a,b,c,d,e,f explain that it is impossible to do an OBJECTIVE ranking of colleges, students, wealth, cars, etc. whenever multiple dimensions of performance are involved. Index Numbers involve subjective choices of factors and weights.

Lecture 3a: Life Expectancies. bit.ly/dsia03a Parts a,b,c,d,e,f This lecture explains the concept of histograms, and how they help us to visualize large data sets.

Lecture 4a: RECAP – Why an Islamic Approach? bit.ly/dsia04a – Further discussion of topic of first lecture, in light of previous lectures, on how and why we need an Islamic approach to statistics.

 

Reflections on my MIT Education – 1

[bit.ly/mit4az1] I recently gave a ZOOM talk to fellow MIT Alumni entitled “Lessons MIT did not teach me” (shortlink: bit.ly/mit4az), which consisted of reflections on my teenage (16-19) experience at MIT from my present perspective at 65. The 45m talk + 45m Q&A was a personal reflection on my experiences. In this sequence of posts, I plan to distil the main lessons that emerge from these experiences, and to try to do so with minimal overlap with the original talk. The following 12m video, discussed in detail in the 1300 word writeup which follows, is the first segment of the talk:

When I landed in Boston as a Freshman at MIT in September 1971, America was another world, which no longer exists. America was the self-confident leader of the World. What won my heart was the friendliness of everyone, from top to bottom. There was so much to admire about the society that I had just entered. It appeared to be Open, Tolerant, Pluralistic, Respectful of Values, Passionate about Justice and Equality for all. I received respect, attention, camaraderie, freedom to pursue my own desires and dreams. In light of subsequent developments – like Trump – many detractors say that what I saw was just a surface appearance, and the rot manifested in “Black Lives Matter” and the brutal murder of George Floyd, was always present underneath this attractive surface. I will not enter into this debate, but I believe that having high ideals can be a source of positive social change. For reasons too complex to discuss here, this potential for positive change was not realized, and instead, the path taken by society led to moral degeneration. See “Social Revolutions” for a discussion of one aspect of this.

As I came to realize much later in life, my real education at MIT was NOT the subjects which I was taught, but the subject matter NOT COVERED in the MIT curriculum. By failing to cover questions of central importance to our lives, we students were led to believe that these questions were not important, and we were left to find answers to them on our own. When we asked about the answers to the big questions, we were told that learning the answer to these important questions depended on our first learning the answers to the small questions which we were being taught. This is an example of bait-and-switch, where the customer is lured with an attractive offer which is not available, and then switched to purchase an inferior product (see: Islamic Knowledge: Still Revolutionary After 1440 Years). The real MYSTERY is: Why did our education ignore ALL of the most important questions we human beings face in our lives? We will sketch an answer to this later.

First, let us discuss what are the important questions that MIT completely ignored and bypassed. What was it that I, as a young, innocent, and ignorant, teenager, needed to learn? The most important questions are, undoubtedly:

  1. What is the purpose of my existence? What is the “Meaning of Life”?
  2. What are the capabilities that I have? Who Am I? What makes me special?
  3. WHICH of them should I work on developing? What can I become?
  4. How can I become the best that it is possible for me to be? What SHOULD I become?
  5. How can I use these precious few moments of life in the best possible way? How can I lead a rich and fulfilling life?

Admittedly these are difficult questions, which have been debated by philosophers and intellectuals for millenia, with no clear-cut consensus answers. But does this mean that we should completely ignore thinking about these issues? That seems like searching for the keys under the lamp-post – instead of looking for answers to deep, difficult, and important questions, let us switch to answering easy and unimportant ones, because at least we know how to do that! Actually, it is not the job of an education to provide us with ready-made answers. Rather, a good education should teach us about the major points of view, discussions, and debate, which has occurred over the centuries regarding these questions. A good education should attempt to teach me “HOW can I learn the answers to these questions?”. Some of the central precepts of Ancient Wisdom, which has been dropped from modern education, are quoted below:

  1. Socrates: “To know thyself is the beginning of wisdom”.
  2. Socrates: “A life unexamined is not worth living”
  3. Aristotle: Knowledge comes from knowing WHY – this requires the four causes: Material, Form, Agent, PURPOSE (Telos)
  4. Lao Tzu: Loving gives strength, being loved gives you courage. See The Secrets of Happiness.
  5. Motto Engraved on Turkish Madrassa: Here we do not teach fish to fly, nor do we teach birds to swim. That is, students are given individualized training, designed to develop their special talents and capabilities best suited to their personalities. See “Teaching Fish to Fly”.

But none of these questions were part of our MODERN curriculum at MIT. Instead, we were taught Physics, Chemistry, Math, Computer Programming, and other technical subjects. Why these questions no longer form part of a Western education is a complex question, discussed at book length by Harvard professor Julie Reuben in her book: “The Making of the Modern University: Intellectual Transformation and the Marginalization of Morality”. Very briefly, in early 20th Century, college catalogs write that the goal of education is to build character, groom personality, develop leadership skills, create awareness of social responsibilities. However, by late 20th Century, none of the college catalogs write about development of character as a goal of education. Instead, all aim to provide PURELY a technical education. For more details see “The Higher Goals of Education”. The main reason for this change is that Character, Morality, ALL knowledge of INTERNAL world of man – the heart and soul – were declared to be OUTSIDE boundaries of knowledge. The definition of KNOWLEDGE itself changed in early 20th Century, due to the spectacular rise of the philosophy of Logical Positivism (see The Emergence of Logical Positivism). The main idea of this philosophy is that SCIENCE is the only source of reliable knowledge. This means that “Knowledge” is only objective knowledge of the external world. Personality, Character, Heart, Soul, Intentions, etc are personal characteristics related to our life-experiences, which are highly unscientific, and hence not part of “knowledge” according to positivists.

The arguments of positivists became widely accepted, and due to their influence, discussion of these questions, of central importance for our human lives, was gradually excluded from the curriculum. For an example of how moral knowledge is discredited, see hugely popular lecture by Micheal Sandel: Justice: What is the Right Thing To Do?. The lecture teaches students that there is no such thing as justice and morality, because we have been debating these central questions for centuries, going around in circles, and never reaching firm conclusions. This is so different from science, where knowledge has accumulated dramatically. This argument is appealing and attractive but deeply and dangerously flawed. Knowledge of the external world is extremely different from our personal knowledge of our internal lives. Knowledge of chemistry and mathematics is not of the same type as that required to “Know Thyself”. Every human must RE-DISCOVER and LEARN what it means to be human. This Internal Knowledge is different from, and FAR MORE IMPORTANT than external knowledge, because it can teach us HOW TO LIVE! Central to this type of knowledge is Life Experience – we can learn a lot from the human experiences, of people who have tried to make sense of our brief existence. However, this is not SCIENTIFIC knowledge because all human beings are unique and different, and have experiences which we cannot replicate in laboratory conditions. Thus, according to positivism, life experience is not knowledge!! This fundamental fallacy is at the heart of Western education today.

RELATED MATERIALS: See also Recovering from a Western Education and The Education of an Economist, which describe my educational experiences. Logical Positivism creates a strong distinction between objective and subjective knowledge and gives the objective a very high status, while devaluing the subjective human experience. This is a dangerous delusion, as explained in Linguistic Paradoxes & Limits of Human Knowledge

WHY an “Islamic Approach” to Statistics?

[bit.ly/dsia04a] This first part of 4th Lec on Descriptive Statistics: An Islamic Approach, revisits the “Islamic” Angle. 15m Video is followed by a detailed 1200 word writeup.

There are some strong OBJECTIONS to taking an Islamic Approach to an apparently neutral topic like “Statistics”. We first list these objections, before answering them.

  1. It is entirely possible to present all the statistical and technical material within a secular, materialistic, framework, aligned with atheism and modernism.
  2. It is also possible to re-create this approach within Christian, Hindu, Buddhist, Confucian, or any other philosophical or cultural tradition.

THEN the question arises with GREATER urgency?

  1. Why am I using an Islamic approach? This terminology restricts the audience, repels many potential students, and, seemingly, offers no obvious advantages.
  2. FURTHERMORE – it appears dishonest – packaging of modern mathematical subjects within ancient frameworks unrelated to the topics under discussion.

 

A DECEPTION created by Modern Secular thought is that we have an OPTION to keep religion out of knowledge. In fact, Modern Secular Thinking is ITSELF a RELIGION, which presents itself as a neutral, unbiased, and objective. There are HIDDEN NORMATIVE assumptions upon which the entire structure of Western Social Science is built. These assumptions TELL us about the purpose of life, without explicit articulation.

In the realm of statistics, most numbers are claimed to be objective, but are full of subjective assumptions and moral values built into foundations. This appearance of objectivity, and claim of neutrality, creates a powerful deception. This enables HOW TO LIE WITH STATISTICS, and Economic Hit-Men. Widely believed deceptions which drive policy all over the globe: College Rankings, Student Evaluations via Objective Quizzes. The most important deception is the use of the GNP per capita as a measure of development. The use of this number represents a SUBJECTIVE opinion about how to measure prosperity and wealth, and the hidden values embodied in this number drive economic policy all over the planet.

Before approaching any subject matter, The First Question we must ask is: WHY?  What is the PURPOSE of studying statistics? In the modern secular approach, this

QUESTION is BYPASSED.  BUT this question CANNOT be bypassed. PURPOSE is IMPLICIT, HIDDEN in the way the subject is presented. When we ask about the PURPOSE of Study, we must ask about Purpose of LIFE itself. ESSENTIAL QUESTIONS are:

What is the purpose of my existence? How will study of statistics HELP me in achieving my life goals? Our “objective” approach hides the important fact that All Knowledge COMES from LIFE EXPERIENCES – these are distilled into a form which make them appear objective. Useful Knowledge MUST relate to LIFE EXPERIENCES. But no mention is made of LIFE and of EXPERIENCE in conventional approach – WHY? These questions were at the heart of the Wisdom of the Ancients, which has been Forgotten by the Moderns:

  1. Socrates: “To know thyself is the beginning of wisdom”.
  2. Socrates: “A Life Un-examined is NOT worth Living”
  3. Aristotle: KNOWLEDGE come from knowing WHY? This involves FOUR Causes:

Consider a TABLE in the dining room:

  1. Material Cause: Wood – material used to create table
  2. Formal Cause: The FORM or DESIGN with four legs, flat surface, raised.
  3. Efficient Cause: Carpenter who took materials & shaped it into design.
  4. FINAL CAUSE: the PURPOSE of creating a table.

It seems OBVIOUS that Knowledge MUST be related to PURPOSE. But conventional approach DOES NOT discuss Purpose: WHY? This is because of the HIDDEN ASSUMPTION of Modern Secular Thought: There is no purpose to life.

One of the leading atheists and influential philosophers of the 20th Century, Bertrand Russell, expresses this explicitly in his essay on “A Free Man’s Worship”:

That man is the product of causes which had no prevision of the end they were achieving; that his origin, his growth, his hopes and fears, his loves and his beliefs, are but the outcome of accidental collocations of atoms; that no fire, no heroism, no intensity of thought and feeling, can preserve an individual life beyond the grave; that all the labors of the ages, all the devotion, all the inspiration, all the noonday brightness of human genius, are destined to extinction in the vast death of the solar system,

According to the modern secular approach: All human effort is MEANINGLESS – No purpose to life, and no purpose for knowledge. Islamic Teachings Strongly Reject This

  1. (75:36) Does Man think he was created without purpose?
  2. (67:2) (HE) created death and life (as a trial) to see who is the BEST in deeds –
  3. (51:56) We have been created only for WORSHIP.

Islamic Approach: HOW to study statistics as an act of worship? How to make this study the best of deeds, so that the ink of the scholars becomes as precious as the blood of the martyrs? This is REQUIRED for us as Muslims. We have NO OPTION but to CREATE an Islamic Approach. This would have happened naturally, but Islam came as a stranger, and has become a stranger. The process of colonization has nearly destroyed the Islamic Intellectual traditions and heritage. As a result, a GREAT EFFORT is required to create a new approach to modern subjects, aligned with the traditions of Isam. We have no option but to reject the Secular Modern Approach: Life is NOT Meaningless. We CAN create alternatives based on OTHER meanings. All religions and philosophies prescribe MEANING, and make search for the Meaning of Life the MOST IMPORTANT problem. All other problems are SUBORDINATE to this. The search for meaning in our lives is CLOSELY related to IDENTITY: Who Am I? How can I approach the study of statistics so that it enhances my life experiences, and helps me learn who I am? According to the Islamic Approach: We acquire knowledge to RECOGNIZE the SIGNS of GOD. HE is hidden in the wonders of the world around us. We can also recognize HIM in our internal world, studying our own selves leads to a recognition of God. Islam provides us with a high and inspiring purpose for our lives.  The BEST of DEEDS: To serve mankind PURELY for the sake of the LOVE of Allah, without intention for fame, popularity, recognition, reward, thanks.

While we must reject the modern secular approach because it says that life is meaningless, there can be Alternative Approaches. We must SPECIFY PURPOSE (for our lives, and for the role knowledge must play in our life experiences). Then we must Adapt our study of statistics to that purpose. Alternatively, we can SEARCH for PURPOSE. The Modern Secular Approach CLOSES the opportunity for meaningful discussion on the most important questions we all face:

  1. HOW can I lead a meaningful life?
  2. HOW can I learn WHO I AM?
  3. HOW can I know of the HIDDEN potential buried within me?
  4. HOW can I develop my talents for EXCELLENCE?
  5. In WHICH direction should I develop my capabilities?
  6. WHAT should be my priorities in LIFE?

It is a CRIME to teach students subjects WITHOUT discussing these Central Questions. Failing to discuss these questions leads to wasted lives spent on futile efforts on irrelevant goals.

Concluding Remarks: Purpose is MISSING from standard approach. Islam provides a clear-cut purpose which is DIRECTLY opposed to the HIDDEN message of the secular modern approach: meaninglessness of life and of knowledge. Conventional approach makes no distinction between useful and useless knowledge, because all effort is meaningless.  When we study statistics with a PURPOSE, this actually CHANGES the subject: we cannot study numbers in abstract, theories without context.  Statistics is based on creating narratives supported by numbers. The essential role of the NARRATIVE is NOW coming to be recognized. For convincing narratives, good rhetoric, numbers must be studied WITHIN the real world context from which they originate. This STRONGLY differentiates a PURPOSEFUL Islamic approach from the modern secular approach.

Links to Related Materials

  1. Learn Who You Are: The search for identity. bit.do/azwya
  2. Reaching Beyond the Stars: Aim high to get great results. bit.do/azrbs
  3. How to Inspire and Motivate Students. When we relate our subjects to life experiences, and to a great vision for serving mankind, it becomes possible to inspire and motivate students: bit.do/azhims
  4.  The Ways of the Eagles : Education teaches us to think low, like crows. Instead, learn vision & purpose to soar the skies, like eagles: bit.do/azwoe
  5. Three Mega Events Which Shape Our Thoughts: How to free our minds from chains created by powerful historical forces: bit.do/azgt4

Probability Histograms

Bit.ly/dsia03f = The goal of this course is to teach about simple descriptive statistics, which allow us to look at and understand the data set. The central link between this, and more advanced concepts, is the random sample. Part E of the Lecture 3, explains some of the most elementary concepts in connection with random samples from populations of objects in the real world.

Concept 1: Choosing one member of the population “at random”. A dictionary defines “random” as: “made, done, happening, or chosen without method or conscious decision”. For example, I could choose a country at random by throwing a dart at a map of the world. However, statisticians use the word “random” in a technical sense, which is very different from the standard English language usage. Choosing at random means that all countries in the population must have exactly equal chances of being chosen. Throwing a dart does not allow us to calculate the probabilities of selection for each country because there are too many variable and subjective factors, including the fact that different countries have different sizes. A reasonable way to do random choice among the 190 countries in the WDI list is to chose a random number between 1 and 190 – for example, via the EXCEL function RANDBETWEEN(1,190). This function gives an equal chance to all numbers and if repeated often, will make all numbers turn up equally often in large samples. It is common to confusing the two senses of the word, the haphazard choice of English language (E-random), and the systematic choice which give equal probability to all possibilities of Statistics (S-random).

Concept 2: To relate S-random choice to Histograms, consider a three bin histogram for the 190 countries, discussed in the previous lecture. The LOW bin goes from 52.8 to 63.5, and has 27 countries. The MID bin goes from 63.5 to 74.2 and has 71 countries. The HIGH bin goes from 74.2 to 84.9, and has 92 countries. The probability Histogram answers the following question: If a country is chosen at random from this population of 190 countries, what is the PROBABILITY that it belongs to any one of the three bins? The answer is obvious. The LOW bin has 27 countries, and so a probability 27/190 of being chosen. The MID and HIGH bin have probabilities 71/190, and 92/190 respectively. This histogram is pictured below. It is EXACTLY the same as the 3-Bin histogram in previous lecture (DSIA03E) EXCEPT for the labels on the Vertical Axis. The COUNT histogram gives us the count of the number of countries within each category. The PROBABILITY histogram replaces the count by the Percentage of countries within each bin. In the COUNT histogram, the axis labels were 10,20,…,100, corresponding to the number countries. In the current Probability histogram, these numbers have been replaced by percentages corresponding to 10/190=5.3%, 20/190=10.5%, 30/190=15.8%,…,80/190=42.1%, 90/190=47.4%, 100/190=52.6%. The number of countries has been divided by 190, the total number, to give percentages in each category,

PercentageHistjpg

Let C be a randomly chosen country from among the 190 countries in the sample. Then the probability histogram above plots the following three probabilities: P(52.8 ≤LE(C) ≤ 63.5), P(63.5 < LE(C) ≤ 74.2), and P(74.2 < LE(C)≤ 84.9). These are the probabilities of the randomly chosen country having life expectancy within the range which defines the bins. These probabilities show the “distribution” of the random variable Life Expectancy – the height of the graph tells us the chance of the random variable being in the specified range. For theoretical purposes, the CUMULATIVE Distribution turns out to be a very important concept. Instead of looking the probability of a bin – P(a < LE(C) < b), the cumulative distribution looks at ALL the probability upto a given number: CDF(x)= P(LE(C) ≤ X).

To explain the cumulative distribution function, it is useful to look at a case where there are only a small number of countries, as  it makes the concepts easier to understand and visualize.

25 Qatar QAT 81.0
84 Turkey TUR 74.5
125 Egypt EGY 69.9
169 Afghanistan AFG 60.6
183 Sudan SDN 56.1

Assume that these five countries, chosen “E-randomly” (in the English language sense) from the 190 countries, are the full population of countries. We will chose a country at S-random from these 5, and call it C. LE(C) represents the Life Expectancy of the randomly chosen country. A characteristic of a randomly  chosen country, like LE, is called a random variable. For any real number x, we want to measure the P(LE(C)≤ x) – this is, by definition, the cumulative distribution function of the random variable LE(C), This can easily be plotted as follows.

CDF

For all values of x less than 56.1, the probability that Life Expectancy of the randomly chosen country is less than this x is 0%. The graph shows that P(LE(C)≤x)=0% in the first step of the graph. This is because Sudan has LE of 56.1, which is the smallest LE in the data set. For values of x between 56.1 and 60.6, P( LE(C)≤x) is 20%. There is only one country (Sudan) which has LE below this range of values. The chances of randomly choosing Sudan are 1 out of 5 or 20%. Similarly, for values of x between 60.6 and 69.9, P(LE(C)≤x) = 40%. For values of x in this range, there are two countries, Sudan and Afghanistan, which have LE’s below x. The chances of choosing either one of these two countries from a random choice between 5 countries are exactly 2/5 or 40%. Similarly, the graph jumps by 20% at the Life Expectancy of each of the five countries in the data set. This brief description provides a preliminary introduction to the concept of cumulative distribution function. More details will be discussed in subsequent course, where the ideas of random sampling, which are at the heart of statistical theory, will be developed. In the current course, we are only concerned with different graphs which represent the data in different ways. The CDF of the data, pictured above, is also one of these ways to make a graph of the data.

RELATED MATERIALS:

Lecture 1: Distinguishing features of an Islamic Approach to Statistics– In four parts: bit.ly/dsia01a , b, c, d, with two supplements e, f

Lecture 2: Comparing Numbers: Comparing multidimensional qualities necessarily involves values, and hence most rankings are subjective, not objective measures of external reality. In six parts: bit.ly/dsia02a, b, c, d, e, f

Lecture 3  Life Expectancies Part A explains the Life Expectancy is a one-dimensional numerical measure, and hence objective. Part B described how LE is computed in detail, and what these numbers mean. Part C makes a start on analyzing World Bank WDI data for 190 countries from 1960 to 2017 on Life Expectancies. Part D constructs, analyzes and interprets HISTOGRAMS for this data set. Part E analyzes effects of changing bin size on Histograms Shortlinks are bit.ly/dsia03a, b, c, d, e