ASR PPT

download ASR PPT

of 13

Transcript of ASR PPT

  • 7/29/2019 ASR PPT

    1/13

    V.Satish Kumar

    Y. Anil Kumar

    S.Durga Prasad

  • 7/29/2019 ASR PPT

    2/13

    -Automatic speechrecognition

  • 7/29/2019 ASR PPT

    3/13

    What is the task?

    What are the main difficulties?

    How is it approached?

    How good is it?

    How much better could it be?

    3/34

    ASR

  • 7/29/2019 ASR PPT

    4/13

    How do humans do that?

    Articulationproduces

    sound waveswhich

    the ear conveysto the brain

    for processing

    Getting a computer to understand spoken language By understand we might mean

    React appropriately

    Convert the input speech into another medium, e.g. text

    Several variables impinge on this

    4/34

  • 7/29/2019 ASR PPT

    5/13

  • 7/29/2019 ASR PPT

    6/13

    6/34

    Digitization Converting analogue signal into digital representation

    Signal processing Separating speech from background noise

    Phonetics

    Variability in human speech Phonology

    Recognizing individual sound distinctions (similar phonemes)

    Lexicology and syntax Disambiguating homophones

    Features of continuous speech Syntax and pragmatics

    Interpreting prosodic features

    Pragmatics Filtering of performance errors (disfluencies)

  • 7/29/2019 ASR PPT

    7/13

    go home

    g o h o m

    x0 x1 x2 x3 x4 x5 x6 x7 x8 x9

    Markov model

    backbone composed

    of phones(hidden because we

    dont know

    correspondences)

    Acoustic observations

    Each line represents a probability estimate (more later)

  • 7/29/2019 ASR PPT

    8/13

    Different types of tasks with different

    difficulties

    Speaking mode (isolated words/continuous speech)

    Speaking style (read/spontaneous) Enrollment (speaker-independent/dependent)

    Vocabulary (small < 20 wd/large >20kword)

    Language model (finite state/context sensitive)

    Perplexity (small < 10/large >100) Signal-to-noise ratio (high > 30 dB/low < 10dB)

    Transducer (high quality microphone/telephone)

  • 7/29/2019 ASR PPT

    9/13

    Health care

    Military

    Air traffic controller

  • 7/29/2019 ASR PPT

    10/13

    Mobile telephony

    Voice User interface

    Speech to text

  • 7/29/2019 ASR PPT

    11/13

    Speech Recognition works best if the

    microphone is close to the user (e.g. in a

    phone, or if the user is wearing a

    microphone). More distant microphones (e.g. on a table or

    wall) will tend to increase the number of

    errors.

    User may speak different languagesLocalaccents may not be recognized

  • 7/29/2019 ASR PPT

    12/13

    Encouraged by some innovative models,

    developments in ASR appear to be

    accelerating. The outlook is optimistic that

    future applications of automatic speechrecognition will contribute substantially to

    the quality of life among deaf children and

    adults, and others who share their lives, as

    well as public and private sectors of the

    business community who will benefit from

    this technology

  • 7/29/2019 ASR PPT

    13/13