ASR PPT
-
Upload
anil-kumar-yerninti -
Category
Documents
-
view
218 -
download
0
Transcript of ASR PPT
-
7/29/2019 ASR PPT
1/13
V.Satish Kumar
Y. Anil Kumar
S.Durga Prasad
-
7/29/2019 ASR PPT
2/13
-Automatic speechrecognition
-
7/29/2019 ASR PPT
3/13
What is the task?
What are the main difficulties?
How is it approached?
How good is it?
How much better could it be?
3/34
ASR
-
7/29/2019 ASR PPT
4/13
How do humans do that?
Articulationproduces
sound waveswhich
the ear conveysto the brain
for processing
Getting a computer to understand spoken language By understand we might mean
React appropriately
Convert the input speech into another medium, e.g. text
Several variables impinge on this
4/34
-
7/29/2019 ASR PPT
5/13
-
7/29/2019 ASR PPT
6/13
6/34
Digitization Converting analogue signal into digital representation
Signal processing Separating speech from background noise
Phonetics
Variability in human speech Phonology
Recognizing individual sound distinctions (similar phonemes)
Lexicology and syntax Disambiguating homophones
Features of continuous speech Syntax and pragmatics
Interpreting prosodic features
Pragmatics Filtering of performance errors (disfluencies)
-
7/29/2019 ASR PPT
7/13
go home
g o h o m
x0 x1 x2 x3 x4 x5 x6 x7 x8 x9
Markov model
backbone composed
of phones(hidden because we
dont know
correspondences)
Acoustic observations
Each line represents a probability estimate (more later)
-
7/29/2019 ASR PPT
8/13
Different types of tasks with different
difficulties
Speaking mode (isolated words/continuous speech)
Speaking style (read/spontaneous) Enrollment (speaker-independent/dependent)
Vocabulary (small < 20 wd/large >20kword)
Language model (finite state/context sensitive)
Perplexity (small < 10/large >100) Signal-to-noise ratio (high > 30 dB/low < 10dB)
Transducer (high quality microphone/telephone)
-
7/29/2019 ASR PPT
9/13
Health care
Military
Air traffic controller
-
7/29/2019 ASR PPT
10/13
Mobile telephony
Voice User interface
Speech to text
-
7/29/2019 ASR PPT
11/13
Speech Recognition works best if the
microphone is close to the user (e.g. in a
phone, or if the user is wearing a
microphone). More distant microphones (e.g. on a table or
wall) will tend to increase the number of
errors.
User may speak different languagesLocalaccents may not be recognized
-
7/29/2019 ASR PPT
12/13
Encouraged by some innovative models,
developments in ASR appear to be
accelerating. The outlook is optimistic that
future applications of automatic speechrecognition will contribute substantially to
the quality of life among deaf children and
adults, and others who share their lives, as
well as public and private sectors of the
business community who will benefit from
this technology
-
7/29/2019 ASR PPT
13/13