**General Introduction: Meanings and Philosophy**

In lecture 8A, we gave a new definition of probability. Each random event has many possible outcomes. One of these possibilities is realized, at which point all other possibilities become “what might have happened”, while the realized outcome acquires 100% probability. Probability is about FUTURE POSSIBILITIES. Everything which can happen creates possible future worlds.

In the early 20^{th} century, there was a huge debate between two different conceptions of uncertainty. On the one hand, Keynes and Knight held that the future was completely uncertain. We did not know the range of possibilities, and we did not know the probabilities to be assigned to these possibilities. As opposed to this, Ramsey and De-Finetti argued that rational decision making requires knowledge of all possible future outcomes, as well as their probabilities. This second conception, that we have knowledge of future outcomes and their probabilities, eventually won out. Modern theories of decision making under uncertainty rely on the Ramsey De-Finetti approach, while the Keynes-Knight approach has been marginalized. The Global Financial Crisis, and many other events, prove conclusively that theory of rational expectations is wrong. Although of tremendous importance, not topic of our study here.

Probability is a mental MODEL of structures of external reality. Following the Keynes-Knight approach, the future is fundamentally uncertain. We can never know hidden structures of reality, and we can never know the future. We ONLY have models of future possible outcomes & probabilities. We NEVER have “Knowledge” in the sense of JTB: Justified True Belief. The Western intellectual tradition put the bar for knowledge too high to allow for the definition of probability that we are using here. Our models are just best guesses based on our experience, and will forever remain unverifiable. There is subtle and fine distinction between subjective & objective models here. Even though it is important, we will not discuss these philosophical aspects pertaining to the meaning of probability further. Rather, we will work with probability in cases where there is substantial consensus about its meaning, and avoid controversy.

**Probability Models are Trees in Time**

We define a probability model to be a tree which grows branches as time progresses:

A probability tree, like the one above, consists of nodes and branches. Each branch has a probability. Each node is a possible outcome, situated in time. Probabilities of nodes vary with time. At any node, the sum of probabilities going forward from that node should equal 100%. That is, the set of nodes at the next time level should cover all possibilities – and one of these possibilities must happen. If we go forward two steps in time, then the probability of a node is obtained by multiplying the probabilities of the two branches which lead to the node.

The most important feature of this model is systematically time-varying probabilities. In the above diagram, at T=1, the three blue circles are in the future, and have probability weights given on their branches. At T=2, one of the three events OCCURS. At this point, the other two events become unrealized alternatives, worlds which could have been, but never will be. Beyond this time, the event which occurs has 100% probability. All alternatives on all other branches have become impossible, and their probabilities are now set to 0%.

**Outcomes**: An Outcome is a Node, which is positioned at a specific time T, on the time-branching probability tree. From time T onwards, the outcome which occurs at time T has probability 100%, and all others have probability 0%. At time T-1, the probability of the outcome is the probability assigned to the branch which leads to this outcome. At time T-2, the probability of the outcome is the product of the two branches which leads to the outcome. And so on.

**Events**: Events are things which can happen in multiple ways. Thus there are many different nodes at which the even occurs. Probability of future event is SUM of probabilities of all possible ways the event can happen. That is, we compute probabilities of each of the nodes at which the event occurs, and add these probabilities to get the probability of the event. Nodes on different branches of the tree are mutually exclusive – if one happens then the other one cannot. Probabilities can be added for such nodes. If two nodes are along the same branch, then one cannot add their probabilities. Rather, one must determine the first node at which an event occurs on a given branch, and add this probability to that of the other nodes.

We now illustrate the use of these rules of probability in some simple examples.

**Drawing Balls from Urns**: An urn contain 4 balls. Two are red, and Two are black. We make two draws at random from these urns. The time-branching probability graph below is a probability model for this situation:

At the first draw, there are equal numbers of red and black balls, so there are two possible events, black or red, and both have equal probability. At T=1, if R was drawn, probability of the 2^{nd} draw of R is now 1/3, while B draw is 2/3. This is because after a black draw, there is one red and two black balls in the urn. Similarly, After initial black draw at T=1, P(T=2:R)=2/3) and P(T=2:B)=1/3. It is essential to use the time index, since probabilities change with time, depending on which outcome occurred in the past.

Now consider asking the probability at time T=0 of a Red ball at time 2: P[T=0]{T=2:R}. There are two outcomes with a red ball at T=2. We can write them as [T=1:R]=>[T=2:R] and [T=1:B]=>[T=2:R]. Here the => indicates the time sequencing and can be read as “followed by”. Draw of R at T=1 followed by another R at T=2 has probability equal to 1/2 x 1/3 = 1/6, by multiplying the branch probabilities. We can write this symbolically as P{[T=1:R]=>[T=2:R]}=1/6. Similarly the probability of Black at T=1, followed by Red at T=2 can be computed by multiplying the branch probabilities: P{[T=1:B]=>[T=2:R]}=1/2×2/3=2/6. There are two ways to draw Red at T=2, and we ADD these two probabilities to get the desired probability:

P[T=0]{T=2:R}=

P[T=0]{[T=1:R]=>[T=2:R]}+P[T=0]{[T=1:B]=>[T=2:R]}=

(1/2 x 1/3)+(1/2 x 2/3)=1/6+2/6=1/2

**Drawing Straws: **As a second example of how we create probability models as branching trees, and use them to compute probabilities, we consider the “drawing straws” example discussed in the previous lecture on the definition of probability. There are N people and N straws. One straw is short. All others are long. Straws are held so that the short end is concealed. All straws look alike to the person who is drawing. People draw straws in sequence. Are the early people MORE likely to draw the short straw? OR are they LESS likely to draw the short straw? Or, are the probabilities the same for all people? To answer this question, we do the calculations. As a general rule, it is best to start with simple cases. This builds understanding. The case of two people and two Straws is trivial; it is left for the student. We go to the next case of 3 People A,B,C with 3 Straws. Label the three straws as L1, L2, S3; here L1 and L2 are the two long straws and S3 is the third short straw. A probability model for this situation is pictured below:

At T=0, there are three equally likely choices. A can draw L1, L2, or S3. If A draws S3, he is chosen, and the probability of this event at T=0 is P[T=0](T=1:A)=1/3. If we are at T=1, and A has not been chosen, then A has drawn either L1 or L2. In both of these cases, B has probability 50% of drawing the short straw at T=2. To compute P[T=0]{T=2:B}, we need to add up the probabilities of the two ways that B can be chosen at T=2. These two ways are [T=1:A=L1]=>[T=2:B=S3] and [T=1:A=L2]=>[T=2:B=S]. Both of these paths which reach the event B is chosen at T=2, have probabilities 1/3 x 1/2 = 1/6. Adding the two probabilities give P[T=0]{T=2:B}=1/6+1/6=1/3. Similarly, we can calculate that the probability at T=0 of choosing C at T=2 is also 1/3. Thus, at time T=0, all three choices, A,B,C, are equally likely with probability 1/3 each.

Probabilities change across time. At T=1, there are three possible branches. On one branch A has been chosen with 100% probability. On the other two branches, A has chosen the long straw and has been eliminated. That is, the probability of A being chosen is now 0%. Also, at T=1, we cannot ask the question of what is the probability that B will draw the short straw and be chosen (P[T=1]{T=2:B=S3}), WITHOUT specifying what happened at T=1. First draw probabilities have been extinguished. We must know which node we are on, in order to calculate the probabilities going forward into the second stage draw. These probabilities can be called CONDITIONAL PROBABILITIES – that is, we must specify the CONDITION (what happened at T=1) in order to compute future probabilities. This conception of conditional probability as chronological, referring to sequencing in time, is new to this definition, and different from classical definitions.

**Four or More Straws**: Once we have solved the 3/3 case, with 3 persons and 3 straws, it is easy to solve the 4/4 case. Suppose we have 4 people (ABCD) and 4 straws, with one short straw. We already know the probabilities in 3/3 case. So we can draw the probability model for the 4/4 situation as below. At the first draw, at T=0, there is 1/4 chance that A is chosen. In the other 3 out of the 4 possible draws, A is eliminated, and also one straw is removed. This means that we are back to the 3/3 case, with 3 persons and 3 straws. This case we have already solved and determined that each of the 3 people have equal chances 1/3 of being chosen. It is similarly easy to go forward and show that the same holds for any number N, with N persons and N straws.

**Housing Lottery**: When I was an undergraduate at MIT in the early 1970’s, there was a wide variety of housing choices, and some were more popular than others. In order to equitably distribute housings, students were randomly assigned a number which would determine their priority in housing choice. The student with ticket #1 would get to choose first, #2 would choose second and so on. To model this situation, suppose that the Housing Office has tickets marked from 1 to 1000, one for each student. Students show up at random. Each one is given a ticket chosen at random from the ones which remain. After all tickets are assigned, students choose housing in sequence according to the ticket number. It used to happen that students would line up in the morning to get the early tickets, because of fear that the good numbers will be gone by the afternoon. Is this fear justified?

Our analysis of the short straw enables us to answer this question. Think of Ticket #1 as the short straw. All people have same probability of drawing short straw. Thus, all students, regardless of when they show up at the housing office, have equal probability of drawing Ticket #1. Similarly for ANY ticket in Housing lottery, all students have equal chances of drawing THAT ticket. Thus all students have equal chances for drawing all tickets.

**Concluding Remarks**: The central feature of our model is that probability exist ONLY for Future Events. This lecture was mainly about the rules for calculating probabilities when they are modelled by time-branching trees. The simple rules can be summarized as follows. At each NODE, probabilities on branches going forward are CONDITIONAL on getting to that NODE. Probabilities going out multiple steps forward are calculated by multiplying probabilities on branches. Probabilities for Events which occur on multiple branches can be obtained by ADDING probabilities of all the nodes on which the event occurs. As time advances, some probabilities are extinguished, and branches corresponding to those possibilities get removed from the tree. These are things that might have been, but are now forever impossible. This time varying feature is an essential aspect of probability, but is not captured by current models of probability.

**Exercise**: To get practice in applying these rules, we describe a simple probability problem. Remebmber to solve for simplest cases first – N=2 is trivial, N=3 is simple. An airplane has N seats and N passengers. Each passenger has been assigned one seat: P1 => 1, P2 => 2, and so on, PN => N. The FIRST passenger ignores his assigned seat, and sits down at random on any one of the seats 1,2,…,N – ALL seats have equal probability on this first choice. All other passengers go to their ASSIGNED seats. HOWEVER, if the seat is already occupied, they choose a seat at RANDOM from all unoccupied seats. What is the probability that the LAST passenger gets to sit in his ASSIGNED seat?

**Links to Related Materials**: Previous Lecture: A New Definition of Probability: http://bit.ly/rsia08a Writeup of this lecture: http://bit.ly/rsia08b Free Online Course on Real Statistics: An Islamic Approach; http://bit.ly/dsia786

Reblogged this on WEA Pedagogy Blog.