Post on 18-Jul-2015
Jerry Dimitriou, Singular Logic
eSpeak TTS Engine: Language Enhancement
28 November 2011 ÆGIS Conference, Brussels, Belgium
Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 2
What is eSpeak
• Open Source, Free to Use TTS Engine– Formant based– Minimal need for resources– More than 20 Languages already available
• Not all of them are in good state.
• Advantages– Intelligible in High Speeds– Easier to enhance languages (Rule based)– Easier to create new sounds (Phonemes)
• Disadvantages– Sound not natural (Robotic)
Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 3
How eSpeak works: Text-To-Phoneme
• Step 1: Text to Phoneme Translation– Rule based with rules contained in a <lang>_rules file– Exceptions of rules in a <lang>_list file– Rules translate normal text to a stream of characters called
phonemes– Phonemes represent a standard sound which is generated:
• either through formants (vowels and voiced consonants)• by playing samples (unvoiced and fricative consonants)
– Examples:• Normal Text eSpeak Phon IPA Alphabet• Amazing → a#m'eIzIN → m e z ŋ ɐ ˈ ɪ ɪ• Brussels → br'Vs@Lz → b səlzɹˈʌ• Disability → d,Is@b'IlI2ti → d səb l tˌɪ ˈɪ ɪ ɪ
Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 4
How eSpeak works: Rules
• Rules Format– <prefix>) <group of letters> ( <suffix> phonemes– a (Cable 'eI– a (tion 'eI– _r) a (tion a– Prefix and Suffix
• Non capital letters represent themselves• Capital letters represent sets of letters
– C → Any Consonant– A → Any Vowel – _ → Start of word at prefix, end of word at suffix
Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 5
How eSpeak works: Exceptions
• Exception Format– <group of letters or word> phonemes and or flags– _" kwoUts– _% p3s'Ent– _0 z'i@roU– _1 w'0n– eg fO@Egz'aamp@L– ibm $abbrev– Ambidextrous $3– from fr0m $u– Flags
• $u, $abbrev, $only, $dot, $pause, etc
Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 6
How eSpeak works: Exceptions
• Exception Format– <group of letters or word> phonemes and or flags– _" kwoUts– _% p3s'Ent– _0 z'i@roU– _1 w'0n– eg fO@Egz'aamp@L– ibm $abbrev– Ambidextrous $3– from fr0m $u– Flags
• $u, $abbrev, $only, $dot, $pause, etc
Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 7
How eSpeak works: Exceptions
• Exception Format– <group of letters or word> phonemes and or flags– _" kwoUts– _% p3s'Ent– _0 z'i@roU– _1 w'0n– eg fO@Egz'aamp@L– ibm $abbrev– Ambidextrous $3– from fr0m $u– Flags
• $u, $abbrev, $only, $dot, $pause, etc
Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 8
How eSpeak works: Phoneme-To-Sound
• Step 2: Phoneme to Sound– Having the list of phonemes, for each phoneme eSpeak generates
a sound– Previous or Next Phoneme may alter phoneme sound– Phoneme sound generation may be from a sample file or from
formant data. – Phoneme data are defined in ph_<language> files
• Eg: ph_english– Example of an entry in ph_english (Phoneme Definition)
• phoneme Ivowel starttype #i endtype #ilength 130IfNextVowelAppend(;)FMT(vowel/ii_2)endphoneme
Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 9
Editing eSpeak files
• eSpeakEdit Program– Used to edit, visualize and compile eSpeak data
• Formant Phoneme Data
• Workflow for text-to-phoneme– Find an error in pronunciation, intonation etc– Check which rule (or exception) generates the error– Edit the rules or the dict file– Compile the data– Retry
Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 10
Editing eSpeak files (2)
• Workflow for phoneme-to-sound– There might be cases where there is no proper sound for a
specific phoneme (usual problem the R sound)• Eg. should be shorter or longer, when stressed or
unstressed– Check all the available sounds that seem similar with the
sound you need, using espeakedit.– If something closer to what you need is found, change or
add its definition in ph_<language> file– If not, create a new phoneme, using espeakedit or record a
new sound, for unvoiced consonants.– retry
Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 11
Editing Demonstration
• Demo of language edit, using espeakedit
Singular Logic ÆGIS Conference, Brussels, Belgium – 28 November 2011 12
Native speakers: How to contribute
• The biggest problem in language editing in eSpeak is ... native speakers.
• One must be a native speaker in order to be able to fix language problems
• How to contribute– Find errors in eSpeak for a certain language and report
them– Try to fix pronunciacion rules by editing rules and
exceptions– Try to fix phoneme sounds by editing phoneme data.– Send back the changes to the eSpeak community