Speech Assessment: Methods and Applications for Spoken Language Learning...

59
Applications for Spoken Language Learning 語語語語語語語 語語語語語 J.-S. Roger Jang ( 張張張 ) [email protected] http://www.cs.nthu.edu.tw/~jan g Multimedia Information Retrieval Lab CS Dept, Tsing Hua Univ, Taiwan

Transcript of Speech Assessment: Methods and Applications for Spoken Language Learning...

Page 1: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Speech Assessment: Methods andApplications for Spoken Language Learning

語音評分的方法、應用與分享

J.-S. Roger Jang ( 張智星 )

[email protected]

http://www.cs.nthu.edu.tw/~jang

Multimedia Information Retrieval Lab

CS Dept, Tsing Hua Univ, Taiwan

Page 2: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Outline

Introduction to speech assessmentMethodsUsing learning to rank for speech assessmentDemosConclusions

Page 3: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Intro. to Speech Assessment

Goal Evaluate a person’s utterance based on some

acoustic features, for language learning

Also known as Pronunciation scoring CAPT (computer-assisted pronunciation training)

Page 4: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Four Aspects of Language Learning

Receptive Skill

Productive Skill

SpeechListening(聽)

Speaking(說)

TextReading(讀)

Writing(寫)

Easier for CALL Harder for CALL

SA!Media

Skills

Page 5: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Speech Assessment

Characteristics of ideal SA Assessment levels: as detailed as possible

Syllables, words, sentences, paragraphs

Assessment criteria: as many as possibletimbre, tone, energy, rhythm, co-articulation, …

Feedbacks: as specific as possibleHigh-level correction and suggestions

Page 6: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Basic Assessment Criteria

Timber ( 咬字 / 音色 ) Based on acoustic

models

Tone ( 音調 / 音高 ) Based on tone

recognition (for tonal language)

Based on pitch similarity with the target utterance

Rhythm ( 韻律 / 音長 ) Based on duration

comparison with the target utterance

Energy ( 強度 / 音量 ) Based on energy

comparison with the target utterance

Page 7: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Additional Assessment Criteria

English Stress ( 重音 )

Levels (word or sentence)

Intonation ( 整句音調 )Declarative sentenceInterrogative sentence

Co-articulation ( 連音 ) A red apple.Did you call me?Won’t you go?Raise your hand.

Mandarin Tone ( 聲調 ) Retroflex ( 捲舌音 ) Co-articulation ( 連音 ) 兒化音

Others Pause

Page 8: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Types of SA

Types of SA (ordered by difficulty) Type 1: 有目標文字、有目標語句 Type 2: 有目標文字、無目標語句 Type 3: 無目標文字、有目標語句 Type 4: 無目標文字、無目標語句

We are focusing on type 1 and 2.

Page 9: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

第一類:有目標文字、有目標語句 方法:

以語音辨識核心為基礎,進行語音和文字的強制對位( Forced Alignment, FA),再根據每一個 Phone的相似度來進行評分

評分方式 音色:和語音辨識核心的語音模型比對 音調、韻律、強度:和目標語句比對

特性: 由於 FA的準確度很高,因此比較容易得到一致性較高的評分結果

範例: myET (艾爾實驗室 ): www.myet.com Saybot (說寶堂 ): www.saybot.com

Page 10: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

第二類:有目標文字、無目標語句 方法:

以語音辨識核心為基礎,進行語音和文字的強制對位( Forced Alignment),再根據每一個 Phone的相似度來進行評分

評分方式 音色:和語音辨識核心的語音模型比對 音調:對於中文,可以經由文字處理來得到標準音調,再由語音進行

音調辨識與評分。英文則無類似方法。 韻律、強度:無法比對

特性: 由於 FA的準確度很高,因此比較容易得到一致性較高的評分結果 教材準備較容易 但無法對韻律及音量進行評分

範例: 階梯英文的 speak & score

Page 11: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

第三類:無目標文字、有目標語句

方法: 以語音辨識核心為基礎,進行語音的自由音節解碼( Free Syllable Decoding, FSD),再根據每一個音節字串的相似度來進行評分。

評分方式 音色:和目標語句音節字串進行比對 音調、韻律、強度:由 FSD產生的音節來比對

特性: 由於 FSD的辨識率只有 6 ~ 7 成,因此比較難得到一致的評分結

果。 也可以直接改用 DTW來進行比對,但由於個人音色差異,評分的

一致性較低。

Page 12: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Our Approach

Basic approach to timbre assessment Lexicon net construction (Usually a sausage net) Forced alignment to identify phone boundaries Phone scoring based on several criteria, such as

ranking, histograms, posterior prob., etc. Weighted average to get syllable/sentence scores

Page 13: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Lexicon Net Construction

Lexicon net for “what are you allergic to?” Sausage net with all possible (and correct)

multiple pronunciations Optional sil between words

Page 14: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Lexicon Net with Confusing PhonesCommon errors for

Japanese learners of Chinese ㄖㄌ

例:天氣熱天氣樂 ㄑㄐ

例:打哈欠 打哈見 ㄘㄗ

例:一次旅行一字旅行 ㄢㄤ

例:晚安晚ㄤ

Rule-based approach to creating confusing syllables (phonological rules!) Rules:

Rule 1: re leRule 2: qi ji Rule 3: ci zi Rule 4: an ang

Example欠 (qian)見 (jian) 、嗆

(qiang) 、降 (jiang)

Page 15: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Example of Japanese Learners Speaking Chinese

去年夏天熱死了 Example 1 Example 2

晚安 Example 1 Example 2

坐下來、慢慢吃 Example 1

他不住的打哈欠 Example 1

一次旅行 Example 1

起風 Example 1

休息 Example 1

Page 16: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Lexicon Net with Confusing Phones

Lexicon net for “ 天氣熱、打哈欠”Canonical form: tian qi re da ha qian16 variant paths in the net:

Page 17: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Automatic Confusing Syllable Id.

強制對位以得到初步切音結果

對華語 411音節進行比對以找出每個音的混淆音

將混淆音節加入辨識網路再進行強制對位及切音

切音結果不再變動? YesNo 輸出混淆音節及辨識網路

Corpus of Japanese learnersOf Chinese

Page 18: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Error Pattern Identification (EPI)

Common insertions/deletions from users以「朝辭白帝彩雲間」為標準語句

• 任意處結束,例如「朝辭白帝」• 任意處開始,例如「彩雲間」• 任意處開始與結束,例如「白帝彩雲」• 任意處開始與結束,並出現跳字,例如「白彩雲」• 疊字,例如「朝…朝辭白帝彩雲間」• 疊詞,例如「朝辭…朝辭白帝彩雲間」• 疊字加換音,例如「朝( cao )…朝( zhao )辭白帝彩雲間」• 兩字對調,例如「朝辭彩帝白雲間」• 錯字,例如「朝辭白帝黑山間」

Page 19: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Lexicon Net for EPI (I)

偵測「從頭開始、在任意處結束」的發音

Page 20: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Lexicon Net for EPI (II)

偵測「從任意處開始,在尾端結束」的發音

Page 21: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Lexicon Net for EPI (III)

偵測「從任意處開始,結束於任意處(但不可跳字)」的發音

Page 22: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Lexicon Net for EPI (IV)

偵測「從任意處開始,結束於任意處,而且可以跳字)」的發音

Page 23: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Design Philosophy of Lexicon Nets

We need to strike a balance between recognition and lexicon In the extreme, we can have a net for free syllable

decoding to catch all error patterns. The feasibility of free syllable decoding is offset

by its not-so-high recognition rate.

Page 24: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Scoring Methods for Speech Assessment

Five phone-based scoring methods Duration-distribution scores (durDis) Log-likelihood scores (hmmLike) Log-posterior scores (hmmPost) Log-likelihood-distribution scores (likeDis) Rank ratio scores (rkRatio)

All based on forced alignment to segment phones

Page 25: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Method 1: Duration-distribution Scores

PDF of phone duration Obtained from forced alignment Normalized by speech rate Fitted by log-normal PDF Max PDF score 100

Page 26: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Method 2: Log-likelihood Scores

Log-likelihood of phone with duration of frames :

where is the likelihood of the frame with the observation vector

10

0

|log1ˆ

dt

ttit qyp

dl

iq d

it qyp | tty

Page 27: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Method 3: Log-posterior Scores

Log-posterior of phone with duration :

where

10

0

|log1

ˆdt

ttti yqP

d

iq d

m

jjjt

iitti

qPqyp

qPqypyqP

1

|

||

Page 28: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Method 4: Log-likelihood-distribution Scores

Use CDF of Gaussian for log-likelihoodCDF = 1 score = 100

Page 29: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Method 5: Rank Ratio ScoresRank ratio

RR to score conversion

where parameters a, b are phone specific.

Possible sets of competing phones for x+y *+y *+*

1#

1

phonescompetingof

qrankqrr jj

bj

j

a

qrrbaqscore

1

100,;

Page 30: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Examples of Rank Ratio Scores

0.5 1 1.5 2 2.5 3-1

-0.5

0

0.5

1

C:/Users/jang/AppData/Local/Temp/tpd41ff40f_68c1_4124_8e25_bfc94ff40b39.wav

Sco

re=

91.4

9

df=

[0 0

0 0

0 0

0]

(sil)-1

-1

(yi)一100

i100

(cun)寸83

c50

u100

nn100

(xiang)想100

x100

i100

a100

ng100

(si)思63

s13

ii100

(sil)-1

-1

(yi)一100

i100

(cun)寸100

c100

u100

nn100

(hui)灰100

h100

u100

e100

i100

(sil)-1

-1

0.5 1 1.5 2 2.5 3

60

70

80

Pitc

h

Pitch1: unbroken

Pitch2: segmented

Page 31: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Demo of Our Prototype

ASR toolbox http://mirlab.org/jang/matlab/toolbox/asr

Command: goDemoSa.m

Page 32: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Item 7

Intro. to Learning to Rank

Learning to rank A supervised learning algorithm which generates a

ranking model based on a training set of partially order items. (A task somewhat between classification and regression.)

Item 2

Item 1

Item 7

Item 3

Item 9Rank functionItem 9

Item 3

Orderedbypreference

Page 33: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Learning to Rank: Methods and App.

Methods Pointwise (e.g., Pranking) Pairwise (e.g., RankSVM, RankBoost, RankNet) Listwise (e.g., ListNet)

Applications Webpage ranking Machine translation Protein structure prediction

Page 34: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Application of LTR to SA

Why use LTR for SA? Human scoring is rank-based

Tsing Hua’s grading system is moving from scores (0~100) to ranks (A, B, C, D…).

Combination of features (scores)Features are complementary.

Effective determination of rankingLTR only generates numerical output with a ranking order

as close as possible to the correct order. A optimum DP-approach is proposed.

Page 35: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

LTR Score Segmentation

nssss ,,, 21 nrrrr ,,, 21

121 ,,, m

Rank 1 Rank 2

1 2 3 4

Rank 3 Rank 4 Rank 5

Given: LTR scores

We want to find the separating scores

:,s2r swith score-to-rank function

Such that

n

iii srsrJ

1

2 is minimized.

s

(sorted)

Desired rank

Page 36: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

LTR Score Segmentation by DP (I)

Formulate the problem in DP framework Optimum-value function D(i,j): The minimum

cost of mapping to rank Recurrent equation

Boundary condition: Optimum cost:

isss ,,, 21 j,,2,1

)1,1(),,1(min),( jiDjiDjrjiD i

],1[,),1( 1 mjjrjD

mnD ,

Page 37: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

LTR Score Segmentation by DP (II)

1

2

3

Desiredrank

Computedrank

)1,1(

),1(min||),(

jiD

jiDjrjiD i

jiD ,

1,1 jiD

jiD ,1Recurrent formula:

Local constraint:

4

5

2r 3r 4r 5r 6r 7r 8r 9r 10r 11r 12r 13r1r

232

1

ss

298

3

ss

276

2

ss

21211

4

ss

Page 38: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

LTR Score Segmentation with DP (III)

50 100 150 200 250

2

4ve

c1

4 21

1.5

2

2.5

3

3.5

4

4.5

5

vec250 100 150 200 250

1

1.5

2

2.5

3

3.5

4

4.5

5DP total distance = 23

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.50

1

2

3

4

5

6

x1

Cla

ss

Data distribution: DP path:

Page 39: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Flow Charts of Our Experiment

Page 40: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Corpora for Experiments

WSJ08000 training utterances, 84 speakers. For training

biphone acoustic models for forced alignment

MIR-SDRecordings of about 4000 multi-syllable English words

by 22 students (12 females and 10 males.) with an intermediate competence level.

Originally designed for stress detectionAvailable at http://mirlab.org/dataSet/public

Page 41: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Human Scoring of MIR-SD

Human scoring Only 50 utterances from each speaker of MIR-SD

are scored by 2 humans, making a total of 1100 utterances

Human scoring are consistent:Correlation Inter-rater HR1-GT HR2-GT

Word-based 0.58 0.84 0.89

Speaker-based 0.78 0.96 0.93

Score 1 2 3 4 5Frequency 110 198 259 409 124Percentage 10% 18% 24% 37% 11%

Page 42: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Examples of MIR-SD

Level 5 apparent, paragraphic, constellation

Level 3 additive, timorous, availably

Level 1 ambiguity, auxiliary, anachronism

Page 43: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Performance Indices

Performance indices used in the literature hr = [1 3 5 4 2 2], cr = [2 3 5 2 1 4]

Recognition rate rRate = 33.33%Recognition rate with tolerance 1 = 66.67%Average absolute difference = 1Correlation coef = 0.54

Page 44: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Performance Evaluation of Different Scoring Methods

Raw score

DP-based k-means

inside outside inside outside

durDis

Corr 0.209 0.217 0.189 0.202 0.194rRate 0.342 0.309 0.281 0.276

rRateT1 0.783 0.771 0.701 0.696AADiff 0.906 0.942 1.109 1.122

hmmLike

Corr 0.120 0.168 0.102 0.144 0.154rRate 0.325 0.306 0.258 0.255

rRateT1 0.780 0.757 0.692 0.689AADiff 0.928 0.973 1.158 1.165

hmmPost

Corr 0.084 0.297 0.265 0.192 0.216rRate 0.344 0.330 0.170 0.162

rRateT1 0.811 0.798 0.565 0.561AADiff 0.862 0.893 1.494 1.499

likeDis

Corr 0.141 0.160 0.125 0.141 0.143rRate 0.316 0.308 0.247 0.247

rRate T1 0.789 0.774 0.665 0.671AADiff 0.924 0.948 1.207 1.203

rkRatio

Corr 0.240 0.232 0.198 0.229 0.236rRate 0.333 0.316 0.269 0.268

rRateT1 0.789 0.779 0.699 0.698AADiff 0.898 0.929 1.120 1.124

Page 45: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

LTR Combination of Scores

Features for LTR durDis and rkRatio: raw scores hmmLike, hmmPost, likeDis: DP segmentation

LTR RankSVM Linear kernel

Baseline hmmPost with DP-based segmentation

Page 46: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Overall Performance Comparison

Legends Score segmentation

Circles: DPTriangles: k-means

Inside/outside testsSolid lines: InsideDashed lines: Outside

Black lines: Baselines

Page 47: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Summary of the Experiment

Segmentation DP (supervised learning) is betten than k-means

(unsupervised learning)

Performance indices Correlation coefficient is not intuitive (consider [4

5 4] and [1 2 1]) Recog. rate and sum of abs. diff. can be optimized

by LTR and DP segmentation

Page 48: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Demo: Practice of Mandarin Idioms of Length 4 ( 一語中的 )

Level (difficulty) of an idiom is based on it’s freq. via Google search:孤掌難鳴 ===> 260,000鶼鰈情深 ===> 43,300亡鈇意鄰 ===> 22,700舉案齊眉 ===> 235,000

Can be adapted for English learning

Next step: multi-threading, fast decoding via FSM

Page 49: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Demo: Recitation Machine(唸唸不忘)

Support Mandarin & English

Support user-defined recitation script

Next step: multithreading for recording & recognition

Page 50: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Licensing for PC Applications

For Mandarin, English, Japanese

Page 51: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

SA for Embedded Systems

Embedded platforms: PMP, iPhone, Androids

Page 53: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Tools and Tutorials

Tools DCPR toolbox

http://mirlab.org/jang/matlab/toolbox/dcpr

SAP toolboxhttp://mirlab.org/jang/mat

lab/toolbox/sap

ASR Toolboxhttp://mirlab.org/jang/mat

lab/toolbox/asr

Tutorials Data clustering and

pattern recognition:http://mirlab.org/jang/boo

ks/dcpr

Audio signal processinghttp://mirlab.org/jang/

books/audioSignalProcessing

Lab page (with demos):http://mirlab.org

Page 54: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Other SA Issues to be Addressed

Core technology Other acoustic features for

scoringPitch (tone/intonation),

volume, duration, pause, coarticulation

Error pattern identification Credit assignment for

sentence-level scores Lack of labeled corpora!

Application side Mulimodal GUI Extensions

Slightly adaptationParagraph-level SAText-free SA

Beyond pronunciationTranslation + recognition

+ assessment

Microphone types

Page 55: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Examples

Coarticulation Knock it off! Mom woke her up

Consonant+consonant Bus stop Push Shirley Ask question Jeff flew south through

Tainan Exception

Change jobsWhich chair

Ref: “和英文系學生一起上英語聽說課” , 黃玟君老師

Page 56: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Examples

Changes due to coarticulation Would you like it? Won’t you go? Raise your hand. It makes you look

younger.

Softened sounds Junction Popcorn Fruitful

Can and can’t I can read the letter. I can’t read the letter.

d and t Better Cider

Page 57: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Most Likely to be Mispronounced

Within Taiwan Pleasure/pressure World/war/word Shirt/short Walk/work Flesh/fresh Supply/surprise Some/son Confirm/conform

Cancel/cancer Mouth/mouse Measure/major Police/please Version/virgin

Page 58: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

Conclusions

Conclusions SA calls for more cues than ASR SA requires techniques from ML/IR Multi-modal approach to SA is a must

“Popcorn”, “Thursday”

On-going & future work Tone recognition & assessment Reliable error pattern identification

Page 59: Speech Assessment: Methods and Applications for Spoken Language Learning 語音評分的方法、應用與分享 J.-S. Roger Jang ( 張智星 ) jang@cs.nthu.edu.tw jang.

References Witt, S. M. and Young, S. J., “Phone-level Pronunciation Scoring and Assessment for Interactive Language Learning”, Speech

Communication 30, 95-108, 2000. Kim, Y., Franco, H., and Neumeyer, L., “Automatic Pronunciation Scoring of Specific phone Segments for Language

Instruction”, in Proceedings of the 4th European Conference on Speech Communication and Technology (Eurospeech ’97), pp. 649-652, Rhodes, 1997.

Neumeyer, L, Franco, H., Digalakis, V., and Weintraub, M., “Automatic Scoring of Pronunciation Quality”, Speech Communication 30, 83-93, 2000.

Franco, H., Neumeyer, L., Digalakis, V., and Ronen, O., “Combination of Machine Scores for Automatic Grading of Pronunciation Quality”, Speech Communication 30, 121-130, 2000.

Cincared, T., Gruhn, R., Hacker, C., Nöth, E., and Nakamura, S., “Automatic Pronunciation Scoring of Words and Sentences Independent from the Non-Native’s First Language”, Computer Speech and Language 23, 65-88, 2009.

Crammer, K. and Singer, Y., “Pranking with Ranking”, in proceedings of the conference on Neural Information Processing Systems (NIPS), 2001.

Joachims, T., “Optimizing Search Engines using Clickthrough Data”, in proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), ACM, 2002.

Freund, Y., Iyer, R., Schapire, R. E., and Singer, Y., “An Efficient Boosting Algorithm for Combining Preferences”, in proceedings of ICML, pp170-178, 1998.

Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., and Hullender, G., “Learning to Rank using Gradient Descent”, in proceedings of ICML, pp. 89-96, 2005.

Cao, Z., Qin, T., Liu, T. Y., Tsai, M. F., and Li, H., “Learning to Rank: From Pairwise Approach to Listwise Approach”, in proceedings of the 24th International Conference on Machine Learning, pp. 129-136, Corvallis, OR, 2007.

Liang-Yu Chen , Jyh-Shing Roger Jang, “Automatic Pronunciation Scoring using Learning to Rank and DP-based Score Segmentation”, submitted to Interspeech 2010.