Probability space
is a three-tuple (Ξ©, π, π) with:
β’ Ξ© β the set of elementary events
β’ π β algebra
β’ π β probability measure
π-algebra over Ξ© is a system of subsets,
i.e. π β π«(Ξ©) (π« is the power set) with:
β’ Ξ© β π
β’ π΄ β π β Ξ© β π΄ β π
β’ π΄π β π π = 1β¦π β π=1π π΄π β π
π is closed with respect to the complement and countable conjunction. It follows: β β π, π is closed also with respect to the countable disjunction (due to the De Morgan's laws)
2Machine Learning : Probability Theory
Probability space
Examples:
β’ π = β ,Ξ© (smallest) and π = π« Ξ© (largest) π-algebras over Ξ©
β’ the minimal π-algebra over Ξ© containing a particular subset π΄ β Ξ©is π = β , π΄, Ξ© β π΄, Ξ©
β’ Ξ© is discrete and finite, π = 2Ξ©
β’ Ξ© = β , the Borel-algebra (contains all intervals among others)
β’ etc.
3Machine Learning : Probability Theory
Probability measure
π: π β 0,1 is a βmeasureβ (Ξ ) with the normalizing π Ξ© = 1
π-additivity: let π΄π β π be pairwise disjoint subsets, i.e. π΄π β© π΄πβ² = β , then
π
π
π΄π =
π
π(π΄π)
Note: there are sets for which there is no measure.
Examples: the set of irrational numbers, function spaces ββ etc.
Banach-Tarski paradoxon (see Wikipedia ):
4Machine Learning : Probability Theory
(For us) practically relevant cases
β’ The set Ξ© is βgood-naturedβ, e.g. βπ, discrete finite sets etc.
β’ π = π« Ξ© , i.e. the algebra is the power set
β’ We often consider a (composite) βeventβ π΄ β Ξ© as the union of elemantary ones
β’ Probability of an event is
π π΄ =
πβπ΄
π(π)
5Machine Learning : Probability Theory
Random variables
Here a special case β real-valued random variables.
A random variable π for a probability space (Ξ©, π, π) is a mapping π: Ξ© β β, satisfying
π: π π β€ π β π β π β β
(always holds for power sets)
Note: elementary events are not numbers β they are elements of a general set Ξ©
Random variables are in contrast numbers, i.e. they can be summed up, subtracted, squared etc.
6Machine Learning : Probability Theory
Distributions
Cummulative distribution function of a random variable π :
πΉπ π = π π: π π β€ π
Probability distribution of a discrete random variable π: Ξ© β β€ :
ππ π = π({π: π π = π})
Probability density of a continuous random variable π: Ξ© β β :
ππ π =ππΉπ(π)
ππ
7Machine Learning : Probability Theory
Distributions
Why is it necessary to do it so complex (through the cummulative distribution function)?
Example β a Gaussian
Probability of any particular real value is zero β a βdirectβ definition of a βprobability distributionβ is senseless
It is indeed possible through the cummulative distribution function.
8Machine Learning : Probability Theory
Mean
A mean (expectation, average ... ) of a random variable π is
πΌπ π =
πβΞ©
π π β π(π) =
π
π:π π =π
π π β π =
π
ππ π β π
Arithmetic mean is a special case:
π₯ =1
π
π=1
π
π₯π =
π
ππ(π) β π
with
π₯ β‘ π and ππ π =1
π
(uniform probability distribution).
9Machine Learning : Probability Theory
Mean
The probability of an event π΄ β Ξ© can be expressed as the mean value of a corresponding βindicatorβ-variable
π π΄ =
πβπ΄
π(π) =
πβΞ©
π π β π(π)
with
π π = 1 if π β π΄0 otherwise
Often, the set of elementary events can be associated with a random variable (just enumerate all π β Ξ© ).
Then one can speak about a βprobability distribution over Ξ©β (instead of the probability measure).
10Machine Learning : Probability Theory
Example 1 β numbers of a die
The set of elementary events: Ξ© = π, π, π, π, π, π
Probability measure: π π =1
6, π π, π =
1
3β¦
Random variable (number of a die): π π = 1, π π = 2β¦π π = 6
Cummulative distribution: πΉπ 3 =1
2, πΉπ 4.5 =
2
3β¦
Probability distribution: ππ 1 = ππ 2 β¦ππ 6 =1
6
Mean value: πΌπ π = 3.5
Another random variable (squared number of a die)
πβ² π = 1, πβ² π = 4 β¦ πβ² π = 36
Mean value: πΌπ π = 151
6
Note: πΌπ πβ² β πΌπ2(π)
11Machine Learning : Probability Theory
Example 2 β two independent dice numbers
The set of elementary events (6x6 faces):
Ξ© = π, π, π, π, π, π Γ π, π, π, π, π, π
Probability measure: π ππ =1
36, π ππ, ππ =
1
18β¦
Two random variables:
1) The number of the first die: π1 ππ = 1, π1 ππ = 1, π1 ππ = 5 β¦
2) The number of the second die: π2 ππ = 2, π2 ππ = 3, π2 ππ = 6 β¦
Probability distributions:
ππ1 1 = ππ1 2 = β― = ππ1 6 =1
6
ππ2 1 = ππ2 2 = β― = ππ2 6 =1
6
12Machine Learning : Probability Theory
Example 2 β two independent dice numbers
Consider the new random variable: π = π1 + π2
The probability distribution ππ is not uniform anymore
ππ β (1,2,3,4,5,6,5,4,3,2,1)
Mean value is πΌπ π = 7
In general for mean values:
πΌπ π1 + π2 =
πβΞ©
π π β (π1 π + π2 π ) = πΌπ π1 + πΌπ(π2)
13Machine Learning : Probability Theory
Random variables of higher dimension
Analogously: Let π: Ξ© β βπ be a mapping (π = 2 for simplicity), with π = π1, π2 , π1: Ξ© β β and π2: Ξ© β β
Cummulative distribution function:
πΉπ π, π = π π: π1 π β€ π β© π: π2 π β€ π
Joint probability distribution (discrete):
ππ= π1,π2 π, π = π π: π1 π = π β© π: π2 π = π
Joint probability density (continuous):
ππ= π1,π2 π, π =π2πΉπ(π, π )
ππ ππ 14Machine Learning : Probability Theory
Independence
Two events π΄ β π and π΅ β π are independent, if
π π΄ β© π΅ = π π΄ β π(π΅)
Interesting: Events π΄ and π΅ = Ξ© β π΅ are independent, if π΄ and π΅ are independent
Two random variables are independent, if
πΉπ= π1,π2 π, π = πΉπ1 π β πΉπ2 π β π, π
It follows (example for continuous π):
ππ π, π =π2πΉπ(π, π )
ππππ =
ππΉπ1(π)
ππβ ππΉπ2 π
ππ = ππ1 π β ππ2 π
15Machine Learning : Probability Theory
Conditional probabilities
Conditional probability:
π π΄ π΅) =π(π΄β©π΅)
π(π΅)
Independence (almost equivalent): π΄ and π΅ are independent, if
π π΄ π΅) = π(π΄) and/or π π΅ π΄) = π(π΅)
Bayesβ Theorem (formula, rule)
π π΄ π΅) =π π΅ π΄) β π(π΄)
π(π΅)
16Machine Learning : Probability Theory
Further definitions (for random variables)
Shorthand: π π₯, π¦ β‘ ππ(π₯, π¦)
Marginal probability distribution:
π π₯ =
π¦
π(π₯, π¦)
Conditional probability distribution:
π π₯ π¦ =π(π₯, π¦)
π(π¦)
Note: π₯ π(π₯|π¦) = 1
Independent probability distribution:π π₯, π¦ = π π₯ β π(π¦)
17Machine Learning : Probability Theory
Example
Let the probability to be taken ill be
π πππ = 0.02
Let the conditional probability to have a temperature in that case is
π π‘πππ πππ = 0.9
However, one may have a temperature without any illness, i.e.
π π‘πππ πππ = 0.05
What is the probability to be taken ill provided that one has a temperature?
18Machine Learning : Probability Theory
Example
Bayesβ rule:
π πππ π‘πππ =π π‘πππ πππ β π(πππ)
π(π‘πππ)=
(marginal probability in the denominator)
=π π‘πππ πππ β π(πππ)
π π‘πππ πππ β π πππ + π π‘πππ πππ β π(πππ)=
=0.9 β 0.02
0.9 β 0.02 + 0.05 β 0.98β 0.27
β not so high as expected , the reason β very low prior probabilityto be taken ill
19Machine Learning : Probability Theory
Further topics
The model
Let two random variables be given:
β’ The first one is typically discrete (i.e. π β πΎ) and is called βclassβ
β’ The second one is often continuous (π₯ β π) and is called βobservationβ
Let the joint probability distribution π(π₯, π) be βgivenβ.
As π is discrete it is often specified by π π₯, π = π π β π(π₯|π)
The recognition task: given π₯, estimate π.
Usual problems (questions):
β’ How to estimate π from π₯ ?
β’ The joint probability is not always explicitly specified.
β’ The set πΎ is sometimes huge.
20Machine Learning : Probability Theory
Further topics
The learning task:
Often (almost always) the probability distribution is known up to free parameters. How to choose them (learn from examples)?
Next classes:
1. Recognition, Bayessian Decision Theory
2. Probabilistic learning, Maximum-Likelihood principle
3. Discriminative models, recognition and learning
β¦
21Machine Learning : Probability Theory
Top Related