ASR PPT

7/29/2019 ASR PPT

1/13

V.Satish Kumar

Y. Anil Kumar

S.Durga Prasad

7/29/2019 ASR PPT

2/13

-Automatic speechrecognition

7/29/2019 ASR PPT

3/13

What is the task?

What are the main difficulties?

How is it approached?

How good is it?

How much better could it be?

3/34

ASR

7/29/2019 ASR PPT

4/13

How do humans do that?

Articulationproduces

sound waveswhich

the ear conveysto the brain

for processing

Getting a computer to understand spoken language By understand we might mean

React appropriately

Convert the input speech into another medium, e.g. text

Several variables impinge on this

4/34

7/29/2019 ASR PPT

5/13

7/29/2019 ASR PPT

6/13

6/34

Digitization Converting analogue signal into digital representation

Signal processing Separating speech from background noise

Phonetics

Variability in human speech Phonology

Recognizing individual sound distinctions (similar phonemes)

Lexicology and syntax Disambiguating homophones

Features of continuous speech Syntax and pragmatics

Interpreting prosodic features

Pragmatics Filtering of performance errors (disfluencies)

7/29/2019 ASR PPT

7/13

go home

g o h o m

x0 x1 x2 x3 x4 x5 x6 x7 x8 x9

Markov model

backbone composed

of phones(hidden because we

dont know

correspondences)

Acoustic observations

Each line represents a probability estimate (more later)

7/29/2019 ASR PPT

8/13

Different types of tasks with different

difficulties

Speaking mode (isolated words/continuous speech)

Speaking style (read/spontaneous) Enrollment (speaker-independent/dependent)

Vocabulary (small < 20 wd/large >20kword)

Language model (finite state/context sensitive)

Perplexity (small < 10/large >100) Signal-to-noise ratio (high > 30 dB/low < 10dB)

Transducer (high quality microphone/telephone)

7/29/2019 ASR PPT

9/13

Health care

Military

Air traffic controller

7/29/2019 ASR PPT

10/13

Mobile telephony

Voice User interface

Speech to text

7/29/2019 ASR PPT

11/13

Speech Recognition works best if the

microphone is close to the user (e.g. in a

phone, or if the user is wearing a

microphone). More distant microphones (e.g. on a table or

wall) will tend to increase the number of

errors.

User may speak different languagesLocalaccents may not be recognized

7/29/2019 ASR PPT

12/13

Encouraged by some innovative models,

developments in ASR appear to be

accelerating. The outlook is optimistic that

future applications of automatic speechrecognition will contribute substantially to

the quality of life among deaf children and

adults, and others who share their lives, as

well as public and private sectors of the

business community who will benefit from

this technology

7/29/2019 ASR PPT

13/13

ASR PPT

Documents

Transcript of ASR PPT