Post on 16-Apr-2017
Chinese Minority Language Support in OpenOffice.org
Institute of Software, Chinese Academy of Sciences
Lead / Tibetan Native-lang Project
Yanmin Jia
yanmin@iscas.ac.cn
Agenda
Status quo
Fund support from government
Language computing features
Character set and encoding
Support in OpenOffice.org
Demo
Problem & future work
Conclusion
Status quo
China is a multi-lingual and multi-cultural country
29 minorities have their own languages/scripts
Tibetan, Mongolian, Uighur, Kazak, Khalkhas, Yi, and Tai Le
Population of native speakers
Tibetan: more than 2 millions in China
Ladakh, Nepal, Bhutan, and north area of India bordering Tibet
Mongolian: more than 3 millions
Inner Mongolia & outer Mongolia
Uighur: more than 8 millions
Sinkiang
Language computing
Microsoft platform
Unscribe in Vista
Institute of Software Chinese Academy of Sciences
Red Flag & OpenOffice.org
Fund support from government
863 Hi-Tech Research and Development Program of China
Linux Operating System and Office Suite for Minority Scripts (2003AA1Z2110)
Knowledge Innovation Project sponsored by Chinese Academy of Sciences
Platform-Independent Tibetan Information Processing System Based on Linux (KGCX2-SW-504)
Electronic Product Development Fund sponsored Ministry of Information Industry
Cross-platform Tibetan Office suite
Language computing features
Writing Direction & formatting style
Uighur: bidirectional text
Right to left horizontally
Mongolian
From left to right vertically
Tibetan
From left to right horizontally
Special Line breaking behavior
Complex Script
Character shaping
The same character takes different shapes depending on the context
Ligature
Certain character sequences is rendered as one single shape
Character positioning
Grapheme & pre-composed character
Language computing features (Cont.)
Character set and encoding
In 1980's
Software provider take their own designed character set
Since 1997
Unicode standard
Tibetan 1997 (U+0F00~U+0FFFF)
Mongolian 2000 (U+1800~U+18AF)
Uighur use Arabic as its writing system
Tibetan pre-composed character set standard
Tibetan coded character set Extension A(GB/T20542-2006)
popular used Tibetan BrdaRten (Pre-composed character)
Tibetan coded character set Extension B
Devanagari transliteration of Tibetan
Non-BMP
Tibetan and Himalayan Digital Library
Jomolhari (Tibetan font)
Support in OpenOffice.org
Rendering
Smart font OpenType
GSUB
GPOS
Complex Layout Engine
Input
an array of Unicode characters in logical order
Output
an array of glyph indices
an array of character indices for the glyphs
an array of glyph positions
Vcl
ICU LayoutEngine (Linux)
Uniscribe (Windows)
Support in OpenOffice.org(Cont.)
Rendering
Mongolian support in ICU LayoutEngine
MonglianOpenTypeLayoutEngine
Mongolian text layout process goes through the following steps
Subdivide string of characters into runs
Further divide each run into clusters
Each cluster is labeled by feature tag
Apply feature tag information to each cluster
Support in OpenOffice.org(Cont.)
Rendering
Bypass Uniscribe on Windows
most minority language speakers are windows users
UniscribeLayout is replaced by IcuLayoutEngine
the Mongolian, Uighur and Tibetan can be rendered correctly
Support in OpenOffice.org(Cont.)
Vertical Text Formatting for Mongolian
Map the vertical text frame to horizontal text frame by rotation
Normal horizontal text formatting is performed
text frame is mapped back to its vertical origin
It's reversible the map between various direction frames
Support in OpenOffice.org(Cont.)
Vertical text formatting for Mongolian
Three functions in sw determine the location and exchange the width and height of the embedded frames
SwitchVerticalToHorizontal
SwitchHorizontalToVertical
SwapWidthAndHeight
Support in OpenOffice.org(Cont.)
Locale
ICU
I18n/l10n
Support in OpenOffice.org(Cont.)
GUI translation
Step 1: Add the New Language to the Resource System;
Step 2: Add the New Language to the Build Environment;
Step 3: Add the New Language to the Localization Tools;
Step 4: Extract Strings and Messages from the Source Code;
Step 5: Translate Extracted Strings and Messages to the New Language;
Step 6: Merge Translated Strings and Messages to Source Code;
Step 7: Add new language to the installation set project;
Step 8: Adding new language to the module helpcontent and readlicense_oo.
(http://l10n.openoffice.org/adding_language.html)
Demo
Demo (Cont.)
Demo (Cont.)
Problem & future work
Problems
It's impossible to benefit from developing software for minority language
Translation is not easy for Chinese minority language
No uniform glossary
No enough people mastering not only programming but also minority language
More fund support
Future work
more features
Transliteration & sorting
Training & application
Money
Work together with OpenOffice.org community
Strong collaboration with software provider
Conclusion
Minority language is an amazing world
The languages will be lost if they aren't saved
OpenOffice.org is much valuable for minority Language
OpenOffice.org should pay more attention on minority language
Welcome software corporations Keep an eye on Chinese minority Languages
SUN, Chinese 2000, Novell and so on
More developing document on OpenOffice.org
Establish Chinese minority language federation based on OpenOffice.org
Thanks!
Click to edit the outline text format
Second Outline Level
Third Outline Level
Fourth Outline Level
Fifth Outline Level
Sixth Outline Level
Seventh Outline Level
Eighth Outline Level
Ninth Outline Level
Click to edit the outline text format
Second Outline Level
Third Outline Level
Fourth Outline Level
Fifth Outline Level
Sixth Outline Level
Seventh Outline Level
Eighth Outline Level
Ninth Outline Level
Click to edit the title text format
Click to edit the outline text format
Second Outline Level
Third Outline Level
Fourth Outline Level
Fifth Outline Level
Sixth Outline Level
Seventh Outline Level
Eighth Outline Level
Ninth Outline Level