Deep Learning for Dialogue Modeling - NTHU
-
Upload
yun-nung-vivian-chen -
Category
Technology
-
view
158 -
download
0
Transcript of Deep Learning for Dialogue Modeling - NTHU
Applied Deep LearningYun-Nung (Vivian) Chen
Deep Learning for Dialogue ModelingYUN-NUNG (VIVIAN) CHEN 陳縕儂VIVIANCHEN.IDV.TW
Mar 29th, 2017
@NTHU
2
Outline
IntroductionFramework
Natural/Spoken Language Understanding
Dialogue Management
Output Generation
ChallengeRecent TrendConclusion
3
Outline
IntroductionFramework
Natural/Spoken Language Understanding
Dialogue Management
Output Generation
ChallengeRecent TrendConclusion
4
Apple Siri (2011)
Google Now (2012)
Facebook M & Bot (2015)
Intelligent Assistants
Google Home (2016)
Microsoft Cortana (2014)
Amazon Alexa/Echo (2014)
5
Why We Need?
Daily Life Usage◦ Weather◦ Schedule◦ Transportation◦ Restaurant Seeking
6
Why We Need?– Get things done
• E.g. set up alarm/reminder, take note
– Easy access to structured data, services and apps• E.g. find docs/photos/restaurants
– Assist your daily schedule and routine• E.g. commute alerts to/from work
– Be more productive in managing your work and personal life
7
Why Natural Language?• Global Digital Statistics (2015 January)
Global Population
7.21B
Active Internet Users
3.01B
Active Social Media Accounts
2.08B
Active Unique Mobile Users
3.65B
The more natural and convenient input of devices evolves towards speech.
8
Intelligent Assistant Architecture
Reactive Assistance
ASR, LU, Dialog, LG, TTS
Proactive Assistance
Inferences, User Modeling, Suggestions
Data Back-end Data
Bases, Services and Client Signals
Device/Service End-points(Phone, PC, Xbox, Web Browser, Messaging Apps)
User Experience“restaurant suggestions”“call taxi”
9
• Spoken dialogue systems are intelligent agents that are able to help users finish tasks more efficiently via spoken interactions.
• Spoken dialogue systems are being incorporated into various devices (smart-phones, smart TVs, in-car navigating system, etc).
Spoken Dialogue System (SDS)
JARVIS – Iron Man’s Personal Assistant Baymax – Personal Healthcare Companion
Good dialogue systems assist users to access information conveniently and finish tasks efficiently.
10
APP BOT
Seamless and automatic information transferring across domains reduce duplicate information and interaction
• A bot is responsible for a “single” domain, similar to an app
愛食記地圖LINE
Goal: Schedule a lunch with Vivian
KKBOX
11
Outline
IntroductionFramework
Natural/Spoken Language Understanding
Dialogue Management
Output Generation
ChallengeRecent TrendConclusion
12
System Framework
Speech Recognition
Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling
Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy
Decision
Output Generation
Hypothesisare there any action movies to see this weekend
Semantic Framerequest_moviegenre=action, date=this weekend
System Action/Policyrequest_location
Text responseWhere are you located?
Screen Displaylocation?
Text InputAre there any action movies to see this weekend?
Speech Signal current bottleneck error propagation
13
Interaction Example
User
Intelligent Agent Q: How does a dialogue system process this request?
Good Taiwanese eating places include Din Tai Fung, Boiling Point, etc. What do you want to choose? I can help you go there.
find a good eating place for taiwanese food
14
System Framework
Speech Recognition
Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling
Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy
Decision
Output Generation
Hypothesisare there any action movies to see this weekend
Semantic Framerequest_moviegenre=action, date=this weekend
System Action/Policyrequest_location
Text responseWhere are you located?
Screen Displaylocation?
Text InputAre there any action movies to see this weekend?
Speech Signal
15
1. Domain IdentificationRequires Predefined Domain Ontology
find a good eating place for taiwanese foodUser
Organized Domain Knowledge (Database)Intelligent Agent
Restaurant DB Taxi DB Movie DB
Classification!
16
2. Intent DetectionRequires Predefined Schema
find a good eating place for taiwanese foodUser
Intelligent Agent
Restaurant DB
FIND_RESTAURANTFIND_PRICEFIND_TYPE
:
Classification!
17
3. Slot FillingRequires Predefined Schema
find a good eating place for taiwanese foodUser
Intelligent Agent
Restaurant DB
Restaurant Rating TypeRest 1 good TaiwaneseRest 2 bad Thai
: : :
FIND_RESTAURANTrating=“good”type=“taiwanese”
SELECT restaurant { rest.rating=“good” rest.type=“taiwanese”}Semantic Frame Sequence Labeling
O O B-rating O O O B-type O
18
System Framework
Speech Recognition
Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling
Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy
Decision
Output Generation
Hypothesisare there any action movies to see this weekend
Semantic Framerequest_moviegenre=action, date=this weekend
System Action/Policyrequest_location
Text responseWhere are you located?
Screen Displaylocation?
Text InputAre there any action movies to see this weekend?
Speech Signal
19
State TrackingRequires Hand-Crafted States
User
Intelligent Agent
find a good eating place for taiwanese food
location rating type
loc, rating rating, type loc, type
all
i want it near to my office
NULL
20
State TrackingRequires Hand-Crafted States
User
Intelligent Agent
find a good eating place for taiwanese food
location rating type
loc, rating rating, type loc, type
all
i want it near to my office
NULL
21
State TrackingHandling Errors and Confidence
User
Intelligent Agent
find a good eating place for taixxxx food
FIND_RESTAURANTrating=“good”type=“taiwanese”
FIND_RESTAURANTrating=“good”type=“thai”
FIND_RESTAURANTrating=“good”
location rating type
loc, rating rating, type loc, type
all
NULL
?
?
22
Policy for Agent Action• Inform
– “The nearest one is at Taipei 101”
• Request– “Where is your home?”
• Confirm– “Did you want Taiwanese food?”
• Database Search
• Task Completion / Information Display– ticket booked, weather information
Din Tai Fung::
23
System Framework
Semantic Framerequest_moviegenre=action, date=this weekend
Speech Recognition
Language Understanding (LU)• Domain Identification• User Intent Detection• Slot Filling
Dialogue Management (DM)• Dialogue State Tracking• System Action/Policy
Decision
Hypothesisare there any action movies to see this weekend
Text InputAre there any action movies to see this weekend?
Speech Signal
Output Generation
System Action/Policyrequest_location
Text responseWhere are you located?
Screen Displaylocation?
24
Output / NL Generation• Inform
– “The nearest one is at Taipei 101” v.s.
• Request– “Where is your home?” v.s.
• Confirm– “Did you want Taiwanese food?”
25
Outline
IntroductionFramework
Natural/Spoken Language Understanding
Dialogue Management
Output Generation
ChallengeRecent TrendConclusion
Challenge• Predefined semantic schemaChen et al., “Matrix Factorization with Knowledge Graph Propagation for Unsupervised Spoken Language Understanding,” in ACL-IJCNLP, 2015.
• Data without annotationsChen et al., “Zero-Shot Learning of Intent Embeddings for Expansion by Convolutional Deep Structured Semantic Models,” in ICASSP, 2016.
• Semantic concept interpretationChen et al., “Deriving Local Relational Surface Forms from Dependency-Based Entity Embeddings for Unsupervised Spoken Language Understanding,” in SLT, 2014.
• Predefined dialogue statesChen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.
• Error propagationHakkani-Tur et al., “Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM,” in Interspeech, 2016.
• Cross-domain intention/bot hierarchySun et al., “An Intelligent Assistant for High-Level Task Understanding,” in IUI, 2016.Sun et al., “AppDialogue: Multi-App Dialogues for Intelligent Assistants,” in LREC, 2016.Chen et al., “Leveraging Behavioral Patterns of Mobile Applications for Personalized Spoken Language Understanding,” in ICMI, 2016.
• Cross-domain information transferringKim et al., “New Transfer Learning Techniques For Disparate Label Sets,” in ACL-IJCNLP, 2015.
FIND_RESTAURANTrating=“good” rating=5? 4?
HotelRest Flight
Travel
Trip Planning
26
27
Outline
IntroductionFramework
Natural/Spoken Language Understanding
Dialogue Management
Output Generation
ChallengeRecent TrendConclusion
A Single Neuron
z
1w
2w
Nw…
1x
2x
Nx
b
z z
zbias
y
zez
1
1
Sigmoid function
Activation function
1
w, b are the parameters of this neuron28
A Single Neuron
z
1w
2w
Nw…
1x
2x
Nx
bbias
y
1
5.0"2" 5.0"2"
ynotyis
A single neuron can only handle binary classification29
MN RRf :
A Layer of Neurons
• Handwriting digit classification MN RRf :
A layer of neurons can handle multiple possible output,and the result depends on the max one
…
1x
2x
Nx
1
1y
……“1” or not
“2” or not
“3” or not
2y
3y
10 neurons/10 classes
Which one is max?
Deep Neural Network (DNN)
• Fully connected feedforward network
1x
2x
……
Layer 1
……
1y
2y
……
Layer 2…
…Layer L
……
……
……
Input Output
MyNx
vector x
vector y
Deep NN: multiple hidden layers
MN RRf :
32
RNN for SLU• IOB Sequence Labeling for Slot Filling
• Intent Classification
𝑤0 𝑤1 𝑤2 𝑤𝑛
h0𝑓 h1
𝑓 h2𝑓 h𝑛
𝑓
h0𝑏 h1
𝑏 h2𝑏 h𝑛
𝑏
𝑦 0 𝑦 1 𝑦 2 𝑦 𝑛
(a) LSTM (b) LSTM-LA (c) bLSTM-LA
(d) Intent LSTM
intent
𝑤0 𝑤1 𝑤2 𝑤𝑛
h0 h1 h2 h𝑛
𝑦 0 𝑦 1 𝑦 2 𝑦 𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛
h0 h1 h2 h𝑛
𝑦 0 𝑦 1 𝑦 2 𝑦 𝑛
𝑤0 𝑤1 𝑤2 𝑤𝑛
h0 h1 h2 h𝑛
33
RNN for SLU• Joint Multi-Domain Intent Prediction and Slot Filling
– Information can mutually enhanced
semantic frame sequence
ht-1 ht+1htW W W W
taiwanese
B-type
U
food
U
please
U
VO
VO
V
hT+1
EOS
U
FIND_RESTV
Slot Tagging Intent Prediction
Hakkani-Tur, et al., “Multi-Domain Joint Semantic Frame Parsing using Bi-directional RNN-LSTM,” in Interspeech, 2016.
34
just sent email to bob about fishing this weekend
O O O OB-contact_name
OB-subject I-subject I-subject
U
S
I send_emailD communication
send_email(contact_name=“bob”, subject=“fishing this weekend”)
are we going to fish this weekend
U1
S2 send_email(message=“are we going to fish this weekend”)
send email to bob
U2
send_email(contact_name=“bob”)
B-messageI-message
I-message I-message I-messageI-message I-message
B-contact_nameS1
Single Turn
Multi-Turn
Domain Identification Intent Prediction Slot Filling
Contextual SLU (Chen et al., 2016)
35
u
Knowledge Attention Distributionpi
mi
Memory Representation
Weighted Sum h
∑ Wkg
oKnowledge Encoding
Representationhistory utterances {xi}
current utterance
c
Inner Product
Sentence EncoderRNNin
x1 x2 xi…
Contextual Sentence Encoder
x1 x2 xi…
RNNmem
slot tagging sequence y
ht-1 ht
V V
W W W
wt-1 wt
yt-1 yt
U U
RNN Tagger
M M
Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
Contextual SLU (Chen et al., 2016)
Idea: additionally incorporating contextual knowledge during slot tagging track dialogue states in a latent way
36
E2E Supervised Dialogue System
Wen, et al., “A Network-based End-to-End Trainable Task-Oriented Dialogue System,” arXiv.:1604.04562v2.
0 0 0 … 0 1
Database Operator
Copy field
…
Database
Seven daysCurry PrinceN
irala
Royal StandardLittle Seuol
DB pointer
Can I have korean
Korean 0.7British 0.2French 0.1
…
Belief Tracker
Intent Network
Can I have <v.food>
Generation Network <v.name> serves great <v.food> .
Policy Network
zt
ptxt
MySQL query:“Select * where food=Korean”
qt
37
InfoBot: E2E Dialogue System with Supervised & Reinforcement Learning
Dhingra, et al., “End-to-End Reinforcement Learning of Dialogue Agents for Information Access,” arXiv.:1609.00777v2.
Movie=?; Actor=Bill Murray; Release Year=1993
Find me the Bill Murray’s movie.
I think it came out in 1993.
When was it released?
Groundhog Day is a Bill Murray movie which came out in 1993.
KB-InfoBotUser
(Groundhog Day, actor, Bill Murray)(Groundhog Day, release year, 1993)(Australia, actor, Nicole Kidman)(Mad Max: Fury Road, release year, 2015)
Knowledge Base (head, relation, tail)
Idea: differentiable database for propagating the gradients
TC-Bot: E2End Task-Completion Bot
38
wi
B-type
wi+1 wi+2
O O
EOS
<intent>
wi
B-type
wi+1 wi+2
O O
EOS
<intent>
Semantic Framerequest_moviegenre=action, date=this weekend
System Action / Policyrequest_location
User Dialogue ActionInform(location=San Francisco)
Time t-1
wi
<slot>
wi+1 wi+2
O O
EOS
<intent>
Language Understanding (LU)
Time t-2
Time t
Dialogue Management
(DM)
w0 w1 w2
Natural Language Generation (NLG)EOSUser
Goal
User Agenda Modeling
User SimulatorEnd-to-End Neural Dialogue System
Text InputAre there any action movies to see this weekend?
Idea: supervised learning for each component and reinforcement learning for end-to-end training the neural dialogue system
Li, et al., “End-to-End Task-Completion Neural Dialogue Systems,” arXiv.:1703.01008.
39
RL TC-BotUser goal: Two tickets for “the witch” tomorrow 9:30 PM at regal meridian 16, Seattle.
RULE BASED SYSTEMUser: Can I get 2 tickets for the witch?Agent: Which theater do you want?User: Regal meridian 16Agent: What time would you like to see it?User: 9:30 pmAgent: What date would you like to see it?User: Tomorrow!Agent: How many people are you?User: Just two adults.Agent: What is your location?User: Seattle!Agent: What kind of video format do you like?User: Oh, any kind is okay. Agent: Do you have any price constraint?User: No, I don’t care.Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.User: Thank you.Agent: Thank you!
REINFORCEMENT LEARNING SYSTEMUser: Can I get 2 tickets for the witch?Agent: What date would you like to see it?User: TomorrowAgent: Which theater do you prefer?User: Regal meridian 16Agent: What is your location?User: SeattleAgent: What time would you like to see it?User: 9:30 pm.Agent: Great - I was able to purchase 2 tickets for you to see the witch tomorrow at regal meridian 16 theater in Seattle at 9:30 pm.User: Thanks.Agent: Thanks!
The system can learn how to efficiently interact with users for task completion
40
Outline
IntroductionFramework
Natural/Spoken Language Understanding
Dialogue Management
Output Generation
ChallengeRecent TrendConclusion
41
Conclusion• The conversational systems can manage information access
via spoken interactions• A domain is usually constrained by the backend service
– Semantic schema should be predefined– Cross-domain knowledge and intention is difficult to handled
• NN-Based Dialogue System– Pipeline outputs are represented as vectors distributional
• Semantic frames as vectors to encode confidence
• Implicitly represent dialogue states in hidden vectors
– The execution is constrained by backend services symbolic
42
Q & AT H A N K S F O R Y O U R AT T E N T I O N !