Automatic generation of domain models for call centers
-
Upload
david-przybilla -
Category
Documents
-
view
633 -
download
1
Transcript of Automatic generation of domain models for call centers
![Page 1: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/1.jpg)
Automatic Generation of Domain Models for Call Centers from
Noisy Transcriptions
David Przybilla
Knowledge Representation Seminar
WS 2012/2013
![Page 2: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/2.jpg)
Outline
1. The Problem
2. Proposed Solution • Using Speech Recognition • Feature Engineering ( NLP Component • Taxonomy Builder • Model Builder
3. Application
4. Results
5. Conclusions
![Page 3: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/3.jpg)
1. The Problem
Different Domains
• Mobile Phones • Apparel • Services...
Domain Model
emails
Speech Audio
Taxonomy Evaluate Agents
Identify Key Problems
Useful for
Aid Agents
Efficency
Unsupervised
![Page 4: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/4.jpg)
2. Solution: Automatically Building a
Domain Model
![Page 5: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/5.jpg)
2.1 Automatic Speech Recognition
● Trained an ASR system using “more than 2000
Calls”
– 125 of these has topic annotations
![Page 6: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/6.jpg)
Automatic Speech Recognition
● Issues
– Different Accents
– Error rate for phone calls around 40%
● Deletion of words
● Wrong words are inserted
● Wrong speaker is assigned
– Noise:
● No punctuation marks, silence periods
● No sentence boundaries.
● False starts
● Filllings words. (“umm ”, “uhh”)
![Page 7: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/7.jpg)
2.2 Feature Engineering Component
( NLP Component)
![Page 8: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/8.jpg)
2.2 Feature Engineering Component
( NLP Component)
Stemmer Extract ngrams
Stop Words Removal
Conversation Transcriptions
Feature Vectors
![Page 9: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/9.jpg)
Stop Words Removal
• Remove functional words i.e: ‘the’, ‘a’, ‘for’, at….
• Remove filling words. i.e: “mm”, “uhh”
• More discriminative Dimensions
Get the root of each word. i.e : Worked work bunnies bunny …. w
Stemmer
Feature Engineering Component
( NLP Component)
Worked Works working
Feature Vector D2 D1 .. Dn
![Page 10: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/10.jpg)
Extract N-grams
• N-gram : Sequence of n-items. In this Experiment, items are words.
• Discarding N-grams
Clusterer
N-grams examples: “lotus notes” “expense reiumbursement” …
Feature Vectors
![Page 11: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/11.jpg)
Clusterer
● Clustering: Repeated Bisection
– Cosine similarity
– Top Down Approach
Feature Vectors
Set Of Clusters
![Page 12: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/12.jpg)
Clustering: Repeated Bisection
...
….
…..
…..
…..
Do this iteratively until completing K clusters
Step 0
Step 1 Step 2
…..
![Page 13: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/13.jpg)
Repeating Bisection with Different K
Values
Repeater
Bisection : K=5
Repeater
Bisection : K=10
…..
….. ….
….. ….. ….
….. ….. …. Repeater
Bisection : K=100 …..
Mo
re granu
larity of to
pics
![Page 14: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/14.jpg)
Extract N-grams
• N-gram : Sequence of n-items. In this Experiment, items are words.
• Discarding N-grams
Taxonomy Builder
N-grams examples: “lotus notes” “expense reiumbursement” …
Feature Vectors
![Page 15: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/15.jpg)
Taxonomy Builder
– Set of Clusters
● Taxonomy
![Page 16: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/16.jpg)
Taxonomy Builder
…..
….. ….
…..
● Discard Clusters with less
than T elements
Creating the Taxonomy
● Each Node in the taxonomy
is a cluster.
A B
● There is at least one
common document between
A & B.
● B was created during a finer
granularity call to RB
![Page 17: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/17.jpg)
Taxonomy Builder
![Page 18: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/18.jpg)
Model Builder
![Page 19: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/19.jpg)
Add/Organize Information in the
Node Default Properti Node
![Page 20: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/20.jpg)
Model Builder
● Extend each node with additional information:
● Typical actions ● Typical Q&A ● Call statistics
● Style of the agents (for opening and Closing)
Tiled: merge ‘repeated questions’ Ordered: Showing them in the order they appear
![Page 21: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/21.jpg)
Typical Actions
● Actions are around topic features
● Apparently they input topic features ● 10-word window around topic-
vocabulary
● Discard n-grams below a threshold
i.e: Click the font color button
![Page 22: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/22.jpg)
How to Extract Q&A?
● Look for patterns such as:
– How, what, can I , were there…etc
● Answers are sentences following the question.
![Page 23: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/23.jpg)
Call Statistics
● Average Call Duration
● Average Transcription length
● Average number of speaker turns
● Number of calls
● How Agents usually start/end a call
● Allowed them to compare call durations among
different topics.
![Page 24: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/24.jpg)
Asessing the Results (?)
● ‘Almost all issued from the labeled calls’ have
been captured in the Q&A and taxonomy.
● The phrases captured for the Q&A, and
actions are well form In dispite of ASR issues
● Tiling : merged questions, actions. However
semantically similar phrases were not merged
● “The list of topic specific phrases matched and
at times was more similar than hand generated
sets”
![Page 25: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/25.jpg)
Application
How to access the knowledge in the
taxonomy?
Topic Identification
![Page 26: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/26.jpg)
Topic Identification
Identify the topic of call by listening to the
initial part of the call
Discriminative Features
![Page 27: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/27.jpg)
Topic Identification
Variation: check how good is prediction with
certain clusters..
![Page 28: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/28.jpg)
Conclusions
• Automating part of building Knowledge representation is possible
• It is also possible to bring better performance probably by extracting relations, topic vocabulary from manuals, and external knowledge
• Semantic level processing tools can be used to improve the given method
• The application side apparently showed that the created taxonomy is good enough for actually solving problems in the call center
![Page 29: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/29.jpg)
Critical review – How to asses the goodness/correctness of a Taxonomy
– How to compare human generated vs machine
generated taxonomies
– Given the pipeline and the good results, does “ASR”
issues really matter?
– Possibility of adding extra knowledge: from topic
articles, manuals..etc
– The ‘performance’ depends on text clustering ->
goodness of each node.
![Page 30: Automatic generation of domain models for call centers](https://reader031.fdocument.pub/reader031/viewer/2022020207/559765aa1a28aba3558b456f/html5/thumbnails/30.jpg)
Thank you for your time