Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
4190.408 2015-Spring
Bayesian Networks – 3, 4Inference with Probabilistic Graphical Models
Byoung-Tak Zhang
Biointelligence Lab
Seoul National University
1
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
기계학습이란?
• 학습시스템:
– 환경 E와의상호작용으로부터획득한경험적인데이터 D를바탕으로모델 M을자동으로구성하여스스로성능 P를향상하는시스템
• Self-improving Systems (인공지능관점)
• Knowledge Discovery (데이터마이닝관점)
• Data-Driven Software Design (소프트웨어공학관점)
• Automatic Programming (컴퓨터공학관점)
2
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Machine Learning as Automatic Programming
3
ComputerData
ProgramOutput
ComputerData
Output
Program
Traditional Programming
Machine Learning
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Machine Learning (ML): Three Tasks
4
• Supervised Learning– Estimate an unknown mapping from known input and target output pairs– Learn fw from training set D = {(x,y)} s.t.– Classification: y is discrete– Regression: y is continuous
• Unsupervised Learning– Only input values are provided– Learn fw from D = {(x)} s.t.– Density estimation and compression– Clustering, dimension reduction
• Sequential (Reinforcement) Learning– Not target, but rewards (critiques) are provided “sequentially”– Learn a heuristic function fw from Dt = {(st,at,rt) | t = 1, 2, …} s.t.– With respect to the future, not just past– Sequential decision-making– Action selection and policy learning
)()( xxw fyf
xxw )(f
( , , )t t tf a rw s
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
기계학습모델
• 감독학습모델– Neural Nets
– Decision Trees
– K-Nearest Neighbors
– Support Vector Machines
• 무감독학습모델– Self-Organizing Maps
– Clustering Algorithms
– Manifold Learning
– Evolutionary Learning
• 확률그래프모델– Bayesian Networks
– Markov Networks
– Hidden Markov Models
– Hypernetworks
• 동적시스템모델– Kalman Filters
– Sequential Monte Carlo
– Particle Filters
– Reinforcement Learning
5
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Outline
• Bayesian Inference– Monte Carlo– Importance Sampling– MCMC
• Probabilistic Graphical Models– Bayesian Networks– Markov Random Fields
• Hypernetworks– Architecture and Algorithms– Application Examples
• Discussion
6
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Bayes Theorem
7
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
MAP vs. ML
• What is the most probable hypothesis given data?• From Bayes Theorem
• MAP (Maximum A Posteriori)
• ML (Maximum Likelihood)
8
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Bayesian Inference
9
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring) 10
Prof. Schrater’s Lecture Notes
(Univ. of Minnesota)
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring) 11
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Monte Carlo (MC) Approximation
12
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Markov chain Monte Carlo
13
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
MC with Importance Sampling
14
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Graphical Models
15
Graphical Models (GM)
Causal Models Chain Graphs Other Semantics
Directed GMsDependency Networks Undirected GMs
Bayesian Networks
DBNsFST
HMMs
Factorial HMM MixedMemory Markov Models
BMMs
Kalman
Segment Models
Mixture Models
Decision Trees Simple
Models
PCA
LDA
Markov RandomFields / Markov
networks
Gibbs/BoltzmanDistributions
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
BAYESIAN NETWORKS
16
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Bayesian Networks
• Bayesian network
– DAG (Directed Acyclic Graph)
– Express dependence relations between variables
– Can use prior knowledge on the data (parameters)
17
A B C
D E
n
i
iiXPP1
)|()( paX
P(A,B,C,D,E)
= P(A)P(B|A)P(C|B) P(D|A,B)P(E|B,C,D)
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Representing Probability Distributions
• Probability distribution= probability for each combination of values of these attributes
• Naïve representations (such as tables) run into troubles– 20 attributes require more than 220 106 parameters
– Real applications usually involve hundreds of attributes
18
Hospital patients described by
• Background: age, gender, history of diseases, …
• Symptoms: fever, blood pressure, headache, …
• Diseases: pneumonia, heart attack, …
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Bayesian Networks - Key Idea
• utilize conditional independence
• Graphical representation of conditional independence respectively “causal” dependencies
19
Exploit regularities !!!
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Bayesian Networks
1. Finite, directed acyclic graph
2. Nodes: (discrete) random variables
3. Edges: direct influences
4. Associated with each node: a table representing a conditional probability distribution (CPD), quantifying the effect the parents have on the node
20
MJ
E B
A
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Bayesian Networks
21
X1 X2
X3(0.2, 0.8) (0.6, 0.4)
true 1 (0.2,0.8)
true 2 (0.5,0.5)
false 1 (0.23,0.77)
false 2 (0.53,0.47)
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Example: Use a DAG to model the causality
22
TrainStrike
MartinLate
NormanLate
ProjectDelay
OfficeDirty
BossAngry
BossFailure-in-Love
MartinOversleep
NormanOversleep
Normanuntidy
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Example: Attach prior probabilities to all root nodes
23
TrainStrike
MartinLate
NormanLate
ProjectDelay
OfficeDirty
BossAngry
BossFailure-in-Love
MartinOversleep
NormanOversleep
MartinOversleep ProbabilityT 0.01F 0.99
TrainStrike ProbabilityT 0.1F 0.9
NormanOversleep ProbabilityT 0.2F 0.8
Boss failure-in-love ProbabilityT 0.01F 0.99
Normanuntidy
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Example: Attach prior probabilities to non-root nodes
24
TrainStrike
MartinLate
NormanLate
ProjectDelay
OfficeDirty
BossAngry
BossFailure-in-Love
MartinOversleep
NormanOversleep
Norman oversleepT F
Norman
untidy
T 0.6 0.2
F 0.4 0.8
Train strike
T F
Martin oversleep
T F T F
Martin Late
T 0.95 0.8 0.7 0.05
F 0.05 0.2 0.3 0.95
Normanuntidy
Each column is summed to 1.
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Example: Attach prior probabilities to non-root nodes
25
TrainStrike
MartinLate
NormanLate
ProjectDelay
OfficeDirty
BossAngry
BossFailure-in-Love
MartinOversleep
NormanOversleep
Normanuntidy
Each column is summed to 1.
Boss Failure-in-love
T F
Project Delay
T F T F
Office Dirty
T F T F T F T F
Boss Angry
very 0.98 0.85 0.6 0.5 0.3 0.2 0 0.01
mid 0.02 0.15 0.3 0.25 0.5 0.5 0.2 0.02
little 0 0 0.1 0.25 0.2 0.3 0.7 0.07
no 0 0 0 0 0 0 0.1 0.9
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring) 26
Inference
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
MARKOV RANDOM FIELDS (MARKOV NETWORKS)
27
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Graphical Models
28
Directed Graph(e.g. Bayesian Network)
Undirected Graph(e.g. Markov Random Field)
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Bayesian Image Analysis
29
Original Image Degraded (observed) Image
Noise
Transmission
Likelihood Marginal
yProbabilit PrioriA Processn Degradatio
yProbabilit PosterioriA Image Degraded
Image OriginalImage OriginalImage DegradedImage DegradedImage Original
Pr
PrPr Pr
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Image Analysis
• We could thus represent both the observed image (X) and the true image (Y) as Markov random fields.
• And invoke the Bayesian framework to find P(Y|X)
30
X – observed image
Y – true image
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Details
• Remember
• P(Y|X) proportional to P(X|Y)P(Y)– P(X|Y) is the data model.
– P(Y) models the label interaction.
• Next we need to compute the prior P(Y=y) and the likelihood P(X|Y).
31
P(Y | X) =P(X |Y )P(Y )
P(X)µP(X |Y )P(Y )
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Back to Image Analysis
32
Likelihood can be modeled as a mixture of Gaussians.
The potential is modeled to capture the domain knowledge. One common model is the Isingmodel of the form βyiyj
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Bayesian Image Analysis
• Let X be the observed image = {x1,x2…xmn}
• Let Y be the true image = {y1,y2…ymn}
• Goal : find Y = y* = {y1*,y2*…} such that P(Y = y*|X) is maximum.
• Labeling problem with a search space of Lmn
– L is the set of labels.
– m*n observations.
33
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Unfortunately
34
Observed Image SVM MRF
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Markov Random Fields (MRFs)
• Introduced in the 1960s, a principled approach for incorporating context information.
• Incorporating domain knowledge .
• Works within the Bayesian framework.
• Widely worked on in the 70s, disappeared over the 80s, and finally made a big come back in the late 90s.
35
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Markov Random Field
• Random Field: Let be a family of random variables defined on the set S , in which each random variable … takes a value in a label set L. The family F is called a random field.
• Markov Random Field: F is said to be a Markov random field on S with respect to a neighborhood system N if and only if the following two conditions are satisfied:
},...,,{ 21 MFFFF
iF if
Positivity: ( ) 0,P f f F
)|(}){|( :tyMarkovianiiNii ffPiSfP
36
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Inference
• Finding the optimal y* such that P(Y=y*|X) is maximum.
• Search space is exponential.
• Exponential algorithm - simulated annealing (SA)
• Greedy algorithm – iterated conditional modes (ICM)
• There are other more advanced graph cut based strategies.
37
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Sampling and Simulated Annealing
• Sampling– A way to generate random samples from a (potentially very
complicated) probability distribution.
– Gibbs/Metropolis.
• Simulated annealing– A schedule for modifying the probability distribution so that, at “zero
temperature”, you draw samples only from the MAP solution.
• If you can find the right cooling schedule the algorithm will converge to a global MAP solution.
• Flip side --- SLOW finding the correct schedule is non trivial.
38
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Iterated Conditional Modes
• Greedy strategy, fast convergence
• Idea is to maximize the local conditional probabilities iteratively, given an initial solution.
• Simulated annealing with T =0 .
39
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Parameter Learning
• Supervised learning (easiest case)
• Maximum likelihood:
• For an MRF: ( | )/1( | )
( )
U f TP f eZ
* arg max ( | )P f
40
Bio
Intelligence4190.408 Artificial Intelligence (2015-Spring)
Pseudo Likelihood
• So we approximate
• Large lattice theorem: in the large lattice limit M, PL converges to ML estimate.
• Turns out that a local learning method like pseudo-likelihood when combined with a local inference method such as ICM does quite well. Close to optimal results.
( , )
( , )( ) ( | ) =
i i Ni
i j j N j
j
U f f
i N U f fi X
f L
ePL f P f f
e
( ) ( , )ii i N
i
U f U f f
41
Top Related