Endpoint Detection ( 端點偵測 )

Post on 13-Mar-2016

260 views 9 download

description

Endpoint Detection ( 端點偵測 ). Jyh-Shing Roger Jang ( 張智星 ) http://mirlab.org/jang MIR Lab, CSIE Dept National Taiwan Univ., Taiwan. Intro. To Endpoint Detection. Endpoint detection (EPD, 端點偵測 ) Goal: Determine the start and end of voice activity Also known as voice activity detection (VAD) - PowerPoint PPT Presentation

Transcript of Endpoint Detection ( 端點偵測 )

Endpoint Detection( 端點偵測 )

Jyh-Shing Roger Jang (張智星 )http://mirlab.org/jangMIR Lab, CSIE Dept

National Taiwan Univ., Taiwan

-2-

Intro to Endpoint DetectionEndpoint detection (EPD, 端點偵測 )

Goal: Determine the start and end of voice activity Also known as voice activity detection (VAD)

Importance Acts as a preprocessing step for speech-based app. Requires as small computing power as possible

Two modes for recording for speech-base app. Push to talk Offline EPD

Example: Voice command Continuously listening Online EPD

Example: Dialog system, such as SIRI

Quiz!

Cell phone too!

-3-

Types of Features for EPDTime-domain

Volume only Volume and ZCR (zero

crossing rate) Volume and HOD (high-

order difference) …

Frequency-domain Variance of spectrum Entropy of spectrum Spectrum MFCC …

Some features belong to both!

-4-

Typical Approaches to EPDThresholding

Simple thresholdingCompute a feature (e.g.,

volume) from each frameSelect a threshold vth to

identify frames of voice activity

Combined thresholdingUse two features (e.g.,

volume and ZCR) to make decision

Static classification Extract features Perform binary

classificationNegative sil or noisePositive voice activity

Sequence alignment Use hidden Markov

models (HMM) for sequence alignment

You need to use these approaches in EPD program competition.

-5-

Performance Evaluation for EPD (1/2)Two types of errors (typical for all binary

classification) False negative (aka false rejection)

positive negative False positive (aka false acceptance)

negative positiveConfusion matrix/table

Quiz!

-6-

Performance Evaluation for EPD (2/2)Typical methods

Start & end position accuracy Frame-based accuracy

Quiz!

-7-

-8-

EPD by Volume ThresholdingThe simplest method for EPD

Volume is abs sum of samples in a frame.Four intuitive way to select vth:

vth = vmax* vth = vmedian* vth = vmin* vth = v1*

-9-

How Do They Fail?Unfortunately…

All the thresholds fail one way or another. Under what situations do they fail?

vth = vmax*Plosive soundsvth = vmedian*Silence too longvth = vmin*Total-zero framevth = v1*Unstable frame

We need a a better strategy…

-10-

A Better Strategy for Threshold FindingA presumably better way to select vth

vlower = 3rd percentile of volumes vupper = 97th percentile of volumes vth = (vupper-vlower)*+vlower

Why do we need to use percentile? To deal with plosive sounds To deal total-zero frames

Does it fail? Yes, still, in certain situation…

-11-

Example: EPD by VolumeepdByVol01.m

0.5 1 1.5 2

Am

plitu

de

-1

-0.5

0

0.5

1Waveform and EP (method=vol)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

Volu

me

0

50

100

Volume

Play all Play detected

-12-

How to Enhance EPD by Volume?

Major problem of EPD by volume Threshold is hard to determine

Corpus-based fine-tuning Unvoiced parts are likely to be ignored

We need a feature to enhance the unvoiced partsThis can be achieved by ZCR or HOD

-13-

-14-

ZCR for Unvoiced Sound DetectionZCR: zero crossing rate

No. of zero crossing in a frame ZCRvoiced < ZCRsilence < ZCRunvoiced

Example: epdShowZcr01.m

0.5 1 1.5 2

Am

plitu

de

-1

-0.5

0

0.5

1SingaporeIsAFinePlace.wav

Time (sec)0.5 1 1.5 2

Cou

nt

0

50

100

150

200ZCR

Play Wave

Quiz:If frame=[-1 2 -2 3 5 2 -2 1],what is its ZCR?

Quiz!

-15-

EPD by Volume and ZCR1. Determine initial endpoints by u

2. Expand the initial endpoints based on l

3. Further expand the endpoints based on ZCR threshold zc

-16-

Example: EPD by Volume and ZCRepdByVolZcr01.m

0.5 1 1.5 2Am

plitu

de

-1

0

1Waveform and EP (method=volZcr)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2Volu

me

2060

100

Volume

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

ZCR

0

50

Zero crossing rate

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2Am

plitu

de

-1

0

1Waveform after EPD

Play all Play detected

-17-

-18-

EPD by Volume and HODAnother feature to enhance unvoiced sounds:

High order differenceOrder-1 HOD = sum(abs(diff(s)))Order-2 HOD = sum(abs(diff(diff(s))))Order-3 HOD = sum(abs(diff(diff(diff(s)))))…

Quiz:If frame=[-1 2 -2 3 -3 2 -2 1], what is its order-n HOD when n is 1, 2, and 3?

-19-

Example: Plots of Volume and HODhighOrderDiff01.m

0 0.5 1 1.5 2 2.5

Am

plitu

de

-1

-0.5

0

0.5

1Waveform

Time (sec)0 0.5 1 1.5 2 2.5

0

50

100

VolumeOrder-1 diffOrder-2 diffOrder-3 diffOrder-4 diff

-20-

Example: EPD by Vol. and HODepdByVolHod01.m

0.5 1 1.5 2

Am

plitu

de

-1

0

1Waveform and EP (method=volHod)

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2Volu

me

& H

OD

0.5

1Volume & HOD

VolumeHOD

0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

VH

0

0.5

VH

Play all Play detected

-21-

Hard Example: EPD by Vol. and HODA hard example: epdByVolHod02.m

1 2 3 4 5 6

Am

plitu

de

-1

0

1Waveform and EP (method=volHod)

1 2 3 4 5 6

Volu

me

& H

OD

0.5

1Volume & HOD

VolumeHOD

1 2 3 4 5 6

VH

0

0.5

VH

Play all Play detected

-22-

-23-

SpectrogramGoal

Describe energy distribution in each frame along time

MATLAB command [S,F,T] =

spectrogram(signal, frameSize, overlap, fftSize, fs);

Facts Real signals for FFT

Complex conjugate spectrum Take first frameSize/2+1 points when we consider magnitude only

Use zero padding to have a larger fftSize finer freq resolution

-24-

EPD by SpectrumepdShowSpec01.m epdShowSpec02.m

0.5 1 1.5 2

Am

plitu

de

-1

-0.5

0

0.5

1SingaporeIsAFinePlace.wav

Time (sec)0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2

Freq

(Hz)

0

2000

4000

6000

8000

Play Wave

1 2 3 4 5 6

Am

plitu

de

-1

-0.5

0

0.5

1noisy4epd.wav

Time (sec)1 2 3 4 5 6

Freq

(Hz)

0

2000

4000

Play Wave

-25-

How to Aggregate Spectrum?How to aggregate spectrum as a single feature

which is larger (or smaller) when the spectral energy distribution is diversified? Entropy function Geometric mean over arithmetic mean

-26-

Entropy Function (1/2)Entropy function

Property

n

iii

n

iiin

ppentropy

ppppp

1

121

ln)(

1 and i,0,,...,

p

p

./1... when maximum its achieves )(1. is theof one when 0 minimum its achieves )(

21 npppentropypentropy

n

i

pp

Quiz!

-27-

Entropy Function (2/2)Proof by taking derivative

./1... when maximum its achieves )(1. is theof one when 0 minimum its achieves )(

21 npppentropypentropy

n

i

pp

Quiz!

-28-

Plots of Entropy Functionn=2

entropyPlot.m

n=3

-29-

Spectral Entropy

PDF: Normalization

Spectral entropy:

Nifs

fsp N

kk

ii ,...,1,

)(

)(

1

HzforHzfiffs iii 60002500)(

120 iii porpifp

N

kkk ppH

1

log

Reference: Jialin Shen, Jeihweih Hung, Linshan Lee, “Robust entropy-based endpoint detection for speech recognition in noisy environments”, International Conference on Spoken Language Processing, Sydney, 1998

-30-

Geometric/Arithmetic MeansArithmetic & Geometric means

Property

Proof…

nn

ii

n

ii

in

pgmnpam

ipppp

)(,/)(

,0 and ,..., 21

pp

p

npppamgmgmam ... when maximum its achieves

)()()()( 21pppp

Quiz!

-31-

-32-

Classification Based EPDClassify each frame into silence or not

Feature of a frameMagnitude/power spectrumOthers: ZCR, HOD, entropy, gm/am, …

Static classifiers to detect S from UVKNNC, NBC, SVM, NN, …

Sequence aligner to find boundaries of SUV & UVSHMM, CRF, …

Use Machine Learning Toolbox!