1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification...

1

Incorporating In-domain Confidence and Discourse Coherence Measures

in Utterance Verification

ドメイン内の信頼度と談話の整合性

を用いた音声認識誤りの検出

Ian R. Lane, Tatsuya KawaharaSpoken Language Communications Research

Laboratories, ATRSchool of Informatics, Kyoto University

2

Introduction

• Current ASR technologies not robust against:– Acoustic mismatch: noise, channel, speaker

variance– Linguistic mismatch: disfluencies, OOV, OOD

• Assess confidence of recognition hypothesis, and detect recognition errors

Effective user feedback • Select recovery strategy based on type of

error and specific application

3

Previous Works on Confidence Measures

• Feature-based– [Kemp] word-duration, AM/LM back-off

• Explicit model-based– [Rahim] likelihood ratio test against cohort model

• Posterior probability– [Komatani, Soong, Wessel] estimate posterior

probability given all competing hypotheses in a word-graph

Approaches limited to “low-level” information available during ASR decoding

4

Proposed Approach

• Exploit knowledge sources outside ASR framework for estimating recognition confidencee.g. knowledge about application domain, discourse

flow

Incorporate CM based on “high-level” knowledge sources

• In-domain confidence– degree of match between utterance and application

domain • Discourse coherence

– consistency between consecutive utterances in dialogue

5

CMin-domain(Xi): in-domain confidence

CMdiscourse(Xi|Xi-1): discourse coherence

CM(Xi): joint confidence score, combine above with

generalized posterior probability CMgpp(Xi)

CMdiscourse(Xi|Xi-1)

Xi CMin-domain(Xi)Topic

Classification

In-domain Verificatio

n

dist(Xi,Xi-1)

Input utteranc

e

CMgpp(Xi)

ASR front-end

Out-of-domain Detection

CM(Xi)

Xi-1CMin-domain(Xi-1)Topic

Classification

In-domain Verificatio

n

ASR front-end

Out-of-domain Detection

Utterance Verification Framework

6

In-domain Confidence

• Measure of topic consistency with application domain– Previously applied in out-of-domain utterance detection

Examples of errors detected via in-domain confidence

Mismatch of domainREF: How can I print this WORD file double-sidedASR: How can I open this word on the pool-side

hypothesis not consistent by topic in-domain confidence low

Erroneous recognition hypothesisREF: I want to go to Kyoto, can I go by busASR: I want to go to Kyoto, can I take a bath

hypothesis not consistent by topic in-domain confidence low

REF: correct transcription ASR: speech recognition hypothesis

7

Input Utterance Xi(recognition hypothesis)

Feature Vector

Topic confidence scores　 (C(t1|Xi), ... ,C(tm|Xi))

Transformation to Vector-space

In-Domain VerificationVin-domain(Xi)

CMin-domain(Xi) In-domain confidence

Classification of Multiple Topics SVM (1~m)

In-domain Confidence

8

Input Utterance Xi(recognition hypothesis)

In-Domain VerificationVin-domain(Xi)

CMin-domain(Xi)

Classification of Multiple Topics SVM (1~m)

In-domain Confidencee.g. ‘could I have a

non-smoking seat’

(a, an, …, room, …, seat, …, I+have, …(1, 0 , …, 0 , …, 1 , …, 1 ,

…

accom. airplane airport …0.05 0.36 0.94

90 %

Transformation to Vector-space

9

In-domain Verification Model

• Linear discriminate verification model applied

1, …, m trained on in-domain data using “deleted

interpolation of topics” and GPD [lane ‘04]

idomain-indomain-in XVsigmoidXCM i

m

iijji XtCXV

1domain-in |)(

C(tj|Xi): topic classification confidence score of topic tj for input utterance X

j: discriminate weight for topic tj

10

Discourse Coherence

• Topic consistency with preceding utterance

Examples of errors detected via discourse-coherence

Erroneous recognition hypothesisSpeaker A: Previous utterance [Xi-1]

REF: What type of shirt are you looking for?ASR: What type of shirt are you looking for?

Speaker B: Current utterance [Xi]REF: I’m looking for a white T-shirt.ASR: I’m looking for a white teacher.

topic not consistent across utterances discourse coherence low

REF: correct transcription ASR: speech recognition hypothesis

11

• Euclidean distance between current (Xi) and previous (Xi-1) utterances in topic confidence space

• CMdiscourse large when Xi, Xi-1 related, low when differ

1Euclidean1discourse ,| iiii XXdistsigmoidXXCM

Discourse Coherence

m

jijijii XtCXtCXXdist

1

211Euclidean ||),(

12

Joint Confidence Score

Generalized Posterior Probability• Confusability of recognition hypothesis

against competing hypotheses [Lo & Soong]

• At utterance level:

l

jjgpp xGWPPXCM

1

)(

GWPP(xj): generalized word posterior probability of xj

xj: j-th word in recognition hypothesis of X

13

Joint Confidence Score

where

• For utterance verification compare CM(Xi) to threshold ()

• Model weights (gpp, in-domain, discourse), and threshold () trained on development set

1discoursedomain-ingpp

)|(

)()()(

1discoursediscourse

domain-indomain-ingppgpp

ii

iii

XXCM

XCMXCMXCM

14

Experimental Setup

• Training-set: ATR BTEC (basic-travel-expressions-corpus)– ~400k sentences (Japanese/English pairs)– 14 topic classes (accommodation, shopping, transit, …) – Train: topic-classification + in-domain verification

models

• Evaluation data: ATR MAD (machine aided dialogue)– Natural dialogue between English and Japanese speakers via

ATR speech-to-speech translation system– Dialogue data collected based on set of pre-defined scenarios– Development-set: 270 dialogues Test-set: 90 dialogues

On development set train: CM sigmoid transformsCM weights (gpp, in-domain, discourse) Verification threshold ()

15

Speech Recognition Performance

Development

Test

# dialogues 270 90

Japanese Side

# utterances 2674 1011

WER 10.5% 10.7%

SER 41.9% 42.3%

English Side

# utterances 3091 1006

WER 17.0% 16.2%

SER 63.5% 55.2%

• ASR performed with ATRASR; 2-gram LM applied during decoding, rescore lattice with 3-gram LM

16

Evaluation Measure

• Utterance-based Verification– No definite “keyword” set in S-2-S translation– If recognition error occurs (one or more errors)

prompt user to rephrase entire utterance

• CER (confidence error rate)– FA: false acceptance of incorrectly recognized

utterance– FR: false rejection of correctly recognized

utterance utterances#

FR#FA# CER

17

GPP-based Verification Performance

• Accept All: Assume all utterances are correctly recognized• GPP: Generalized posterior probability

Large reduction in verification errors compared with “Accept all” case CER 17.3% (Japanese) and 15.3% (English)

0

20

40

60

Japanese side English side

CER

(%

)

Accept All

GPP

GPP

Accept All

18 CER reduced by 5.7% and 4.6% for “GPP+IC” and “GPP+DC” cases CER 17.3% 15.9% (8.0% relative) for “GPP+IC+DC” case

Incorporation of IC and DC Measures (Japanese)

GPP: Generalized posterior probabilityIC: In-domain confidence DC: Discourse coherence

12.0

14.0

16.0

18.0

CER

(%

)

GPP

+IC

GPP +DC

GPP +IC +DC

GPP

19 Similar performance for English side CER 15.3% 14.4% for “GPP+IC+DC” case

Incorporation of IC and DC Measures (English)

GPP: Generalized posterior probabilityIC: In-domain confidence DC: Discourse coherence

12.0

14.0

16.0

18.0

CER

(%

)

GPP

+IC

GPP +DC

GPP +IC +DC

GPP

20

Proposed novel utterance verification scheme incorporating “high-level” knowledgeIn-domain confidence:

degree of match between utterance and application domain

Discourse coherence:consistency between consecutive utterances

Two proposed measures effectiveRelative reduction in CER of 8.0% and 6.1%

(Japanese/English)

Conclusions

21

“High-level” content-based verification Ignore ASR-errors that do not affect translation

qualityFurther improvement in performance

Topic Switching– Determine when users switch task

Consider single task per dialogue session

Future work

1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification...

Documents

Transcript of 1 Incorporating In-domain Confidence and Discourse Coherence Measures in Utterance Verification...