Unit 04 - Probability - 1 Per Page

1

Stat E100

Unit 4: Probability IPS Chapter 4

2

Unit 4 Outline: Probability

• Definitions and Concepts in Probability

• Rules of Probability

• Independence

• Conditional Probability

• Introduction to Random Variables

• Discrete vs. Continuous

• Means (expected Values), Variances, and Correlations

3

Probability Terminology

Terminology

• random phenomenon – an event whose individual outcomes are uncertain but there is a regular distribution in a large number of repetitions.

Examples:

• Coin tossing and dice rolling

• The lottery and other games of chance

• Drawing a random sample from some population

• outcome: the value of one replication of a random experiment or phenomenon,

Coin Tossing:

• H with one toss of a coin

• HTT with three tosses

4

Probability Terminology (cont.) • sample space (labeled S): is the set of all possible outcomes of a

random phenomenon

• Examples:

1. Toss a coin three times: S = {HHH,THH,HTH,…,TTT}

2. Face showing when rolling a six-sided die: S = {1,2,3,4,5,6}

3. Pick a real number between 1 and 20: S ={[1,20]}

• event (labeled A): a set of outcomes of a random phenomenon.

• Examples:

1. The event A that exactly two heads are obtained when a coin is tossed three times: A={HHT,HTH,THH}

2. The result of the toss of a fair die is an even number: A = {2,4,6}

3. The number chosen from the set of all real numbers between 1 and 20 is at most 8.23: A = {[1,8.23]}

5

Simple Probability

• The probability of an event can be thought of as long run

proportion/frequency

• For a random phenomenon, if the sample space is finite and if

all of the individual outcomes have the same probability,

then the probability of an event A (written P(A)) is the ratio

Use this formula to determine the probability of getting two

heads in three tosses of a coin? Probability of getting an even

number in one roll of a die?

S

AAP

in elements of #

in elements of #)(

6

Events in Sample Spaces

(more Terminology)

• The union of two events A and B is the event that either A occurs or B

occurs or both occur:

• The intersection of two events A and B is the event that both A and B

occur.

Set theory notation (the ∩ and U) is widely used, but not

needed in this class

The complement of an event A, Ac, is the event that A does not occur

and thus consists of outcomes that are not in A. The text just calls it

“not A”

BABorA )(

ABBABandA )(

7

Rules of Probability Rule 1: The probability of any event A satisfies

0 P(A) 1

Rule 2: P(S) = 1. The probability of the sample space S is 1

Rule 3: P(Ac) = 1 – P(A). The probability of an event not

happening is 1 – (Probability of event happening)

Rule 4: If A and B are disjoint events then

P(A or B) = P(A) + P(B).

Note: the general rule is P(A or B) = P(A) + P(B) – P(A and B)

Justification for these can be seen in Venn Diagrams…

8

S

Venn Diagram

Rule 3. For any event A, P(Ac) = 1 - P(A)

9

S

Venn Diagram

Rule 4. If A and B are disjoint events then

P(A or B) = P(A) + P(B)

A and B are

disjoint

(mutually

exclusive)

-No outcomes in

common -

Cannot happen

at same time

10

S

Venn Diagram

In general for any events A and B:

P(A or B) = P(A) + P(B) – P(A and B)

11

Independent events

• There is a 5th `rule’ for `independent’ pairs of events

• Motivation on next slide, using coin tossing

• Rule 5: Two events A and B are independent if and only if

knowing that one event occurs does not change the probability

that the other event occurs. If A and B are independent, then:

P(A and B) = P(A)×P(B)

Sometimes called the multiplication rule for independent

events.

• Does knowing the results of flipping a fair coin once affect the

chances of heads on a 2nd flip?

Note: Independence can’t be easily drawn in a Venn Diagram…

12

An Example

• There is a bag with 3 balls in it: 1 is red, and 2 are black

• You draw two balls out of the bag, one at a time (without

replacement). Define the events:

A: the first ball drawn is black

B: the second ball drawn is black

• Are A and B independent?

13




• Independence





14

Conditional probability

• conditional probability: the probability of one event occurring

under the condition that we know the outcome of another event

• Let A and B be two events in a sample space, with P(A) > 0. The

conditional probability of event B, given that A has occurred,

written P(B|A), is

• P(B|A) read as “probability of B, given A” has happened, or

probability of B if A is true.

• Note that if A and B are independent, P(B|A) = P(B)

• Conditional probability can get tricky!!!

)(

) and ()|(

AP

ABPABP

A: the first ball drawn is black

B: the second ball drawn is black

“Simple” Example

2

1

6/4

6/2

)(

)(

)(

)B and ()|(

BlackisBallFirstP

BlackareBallsBothP

AP

APABP

First Ball

Second Ball

• There is a bag with 3 balls in it: 1 is red, and 2 are black

• You draw two balls out of the bag, one at a time (without

replacement). Define the events:

16

Very tricky…

• The Monty Hall Problem

There are prizes behind 3 doors: two are ‘worthless’ (an

ant farm) and one is expensive (like a new car)

You are asked to choose one of the 3 doors

Then, Monty Hall (from Let’s Make a Deal) opens one

of the other 2 doors and shows you a worthless prize

• Should you switch doors?

• NYTimes take: http://www.nytimes.com/2008/04/08/science/08monty.html

17

A general multiplication rule

(from conditional probability)

• Suppose A and B are two events in a sample space (not

necessarily independent). Then

P(A and B) = P(B | A) × P(A)

P(A and B and C) = P(C | A and B) × [P(B | A) × P(A)]‏

The first relationship is a simple algebraic rearrangement of

what’s above:

)(

) and ()|(

AP

ABPABP

18

A Simple Example

It is known that approximately 20% of men and 3% of women are taller

than 6 feet in the US.

Let F = the event that someone is female and T = taller than 6 feet.

a) What is P(T | F)? What is P(T | FC)?

b) What is the probability that the next person walking through the door

is a woman and 6 feet tall?

c) What is the probability that the next person walking through the door

is 6 feet tall?

19

Simple Example (cont.)

c) What is the probability that the next person walking through the

door is 6 feet tall?

Two ways for this to happen: (T and F) or (T and Fc) [Think Venn Diagrams]

) and () and ()( CFTPFTPTP

)()|()() | ( CC FPFTPFPFTP

115.050.020.050.003.0

2-way tables can help organize your thinking…

Tall (6' or more)

Yes No

Female

Yes

No

P(F)*P(T | F) P(F)*P(not T | F)

= (0.5)*(0.97)

= 0.485

P(not F)*P(T | not F)

= (0.5)*(0.20)

= 0.100

P(not F)*P(not T | not F)

= (0.5)*(0.80)

= 0.400

= (0.5)*(0.03)

= 0.015

P(F and T)

21

Bayes’ Rule

• Bayes’s rule (formula) provides a way to go from P(B | A) to

P(A | B) (they are in general not equal…)‏

• If A and B are two events whose probabilities are

not 0 or 1

)()|()()|(

)()|(

)(

)()|()|(

CC APABPAPABP

APABP

BP

APABPBAP

22

Conditional probability and Bayes’ Rule

d) What is the probability that a randomly selected 6

foot tall person is female?

Or, you can just use the 2x2 table from 2 slide earlier.

130.0115.0

015.0

50.020.050.003.0

50.003.0

)()|()() | (

)() | (

)(

) and ()|(

CC FPFTPFPFTP

FPFTP

TP

TFPTFP

23




• Independence





24

Random Variables

• A random variable is a variable whose value is a

numerical outcome of a random phenomenon.

• Usually denoted by capital letters X, Y or Z

• Example: toss a coin three times, let X = number

of observed heads.

S = {TTT,HTT,THT,TTH,THH,HTH,HHT,HHH}

values for X = {0, 1, 2, 3}

25

Random variables versus data

• There is an important distinction between random variables (Xi) and realized observed data (xi)‏

• A random variable is theoretical and has not yet been observed, but it has the potential to take different values with certain probabilities

• The larger the number of observations (n), the more the histogram of the observed data x1, x2, ... xn resembles the probability distribution of the (theoretical) random variable (Xi)‏

• We have been using random variables all along, but we called them `variables’…responses to day a random sample/poll, public transit data, toddler’s nutritional intake, etc…

26

Discrete Random Variables

• In coin tossing example, number of heads, X, is called a

discrete random variable, a variable with a finite number

of distinct values

• X has a simple distribution of values with associated

probabilities. This is the theoretical analogue of the

observed frequency distribution in a set of numbers.

• For X, we can summarize its distribution in a table or

histogram

27

Formal Terminology

• A Discrete Random Variable X can take on values in a finite set with k members. We denote these possible outcomes by x1, x2, x3, x4, x5, ...,xk

• The Probability Distribution of X is specified by assigning probabilities to each of the possible outcomes. The probability distribution tells us how likely each outcome is.

• The probabilities assigned to each outcome must satisfy the following conditions:

For each xi, P(X = xi) is between 0 and 1.

• That is, 0 ≤ P(X = xi) ≤ 1 for all possible values of xi

The sum of the probabilities equals 1.

• That is, P(X = x1) + P(X = x2) + …+ P(X = xk) = 1

28

A coin tossing random variable

• Let X be the number of heads when a fair coin is tossed

four times (whoa, way more complex!?)

S = {HHHH, HHHT, HHTH, ..., TTTT}

P(HHHH) = P(HHHT) = ... = P(TTTT) = 1/16

• X is a discrete random variable with 5 possible values

{0, 1, 2, 3, 4}

• Its probability distribution is:

Value 0 1 2 3 4

Probability 1/16 4/16 6/16 4/16 1/16

29

Possible outcomes in 4 tosses of a coin

30

Probability distribution for X, the number of

heads when a coin is tossed 4 times

31

Defining a Discrete Random Variable’s

Probability Distribution

The probability distribution for a discrete random variable X,

can completely defined when depicted in any of 3 forms:

1) Tabular form (shown 3 slides earlier)

2) Graphically (last slide)

3) By a formula (example coming in next unit)

Note: When k gets large, the formula based approach is often

the only feasible choice.

32

Expected value (aka mean) of a discrete

random variable • Sometimes called average or mean value of a random variable

• Motivated by X = number heads in 4 tosses of a coin: |S| = 16.

S = {HHHH,THHH,HTHH,HHTH,HHHT,TTHH,…,TTTT}

Average value of X, using its population of values

• 1/16 x (4 + 3 + 3 + 3 + 3 + 2 + … + 0), or

• (4 x 1/16) + (3 x 4/16) + (2 x 6/16) + (1 x 4/16) + (0 x 1/16) = 32/16 = 2.0

• Can be expressed as: XX = xi P(X = xi)

= x1 p1 + x2 p2 + ... + xk pk

Center at 2

33

Mean of a discrete random variable

• If X is a discrete random variable with k possible values, its probability distribution is:

Value x1 x2 x3 ... xk

Probability p1 p2 p3 ... pk

• Mean (or “expected value”) of X, denoted μX, is given by:

X = E(X) = (xiP(X = xi)) = x1 p1 + x2 p2 +...+ xk pk

If X represents a measure on some member of a population, then E(X) is the population mean of this measure

34

Variance of a discrete random variable

• If X is a discrete random variable with k possible values, its probability distribution is:

Value x1 x2 x3 ... xk

Probability p1 p2 p3 ... pk

• The variance of X, denoted 2X, is given by

2X = (x1 - X)2 p1 + (x2 - X)2 p2 + ... + (xk - X)2 pk

This is sometimes written as:

Standard deviation for a RV is the square root of variance:

k

i

ii pxXEXVar1

22 )(])[()(

])[()()( 2 XEXVarXSD

35

Correlation between 2 Discrete RV’s

Recall that the correlation for two variables, when we have

observations on the variables, is

The definition of correlation for two random variables X and Y is:

y

in

i x

i

s

yy

s

xx

nr

)()(

1

1

1

Y

Y

X

Xxy

YXE

)()(

**We’ll never have you calculate this by hand

36

Correlation ( of Random Variables

Correlation for random variables has the same propertie and

interpretation as with data

XY > 0 means that when X tends to be larger than its mean,

Y will tend to be larger than its mean

XY < 0 → when X > µX, Y tends to be smaller than its mean

-1 ≤ XY ≤ 1

The use of correlation in applications is more important than

formula/definition of …

37

Continuous random variables

• A continuous random variable is a variable taking all

possible values in an interval of numbers

• If X is a continuous random variable, its probability

distribution is described by a density curve (defined

graphically or as a formula)

• The probability of an event is the area under the

curve above the values of X that make up the event

• What density curve have we already talked about in

detail?

39

Rules for variances (IPS pg. 271)

• Rule 1: If X is a random variable with variance 2X and a and b

are fixed constants, then

var(a + bX) = b2 2X

• Rule 2: If X and Y have correlation then:

var(X + Y) = 2X + 2

Y + 2 X Y

var(X – Y) = 2X + 2

Y - 2 X Y

var(aX – bY) = a22X + b22

Y – 2a)(b)X Y

• If X and Y are not correlated, = 0 and the above simplifies to:

var(X + Y) = var(X – Y) = 2X + 2

Y

var(aX – bY) = a22X + b22

Y

40

Example – linear combo of random variables Yearly vet costs for cats have an average of $200 and a sd of $100. For dogs, these costs average $250 with a standard deviation of $150. You can expect these costs to be independent. a) What is the expected total yearly vet cost for someone who owns

one cat and one dog? Mean(1X + 1Y) = 1(µX) + 1(µY) = 200 + 250 = $450 b) What is the standard deviation of the total yearly vet cost for

someone who owns one cat and one dog?

Var(1X + 1Y) = 122X + 122

Y + 21)(1)X Y

= 121002 + 121502 + 201)(1)100150 = 32500 SD(X + Y ) = √32500 = $180.28

Unit 04 - Probability - 1 Per Page

Documents

Transcript of Unit 04 - Probability - 1 Per Page