Chinese Minority Language Support in OpenOffice.org

download Chinese Minority Language Support in OpenOffice.org

If you can't read please download the document

Transcript of Chinese Minority Language Support in OpenOffice.org

Chinese Minority Language Support in OpenOffice.org

Institute of Software, Chinese Academy of Sciences

Lead / Tibetan Native-lang Project

Yanmin Jia

[email protected]

Agenda

Status quo

Fund support from government

Language computing features

Character set and encoding

Support in OpenOffice.org

Demo

Problem & future work

Conclusion

Status quo

China is a multi-lingual and multi-cultural country

29 minorities have their own languages/scripts

Tibetan, Mongolian, Uighur, Kazak, Khalkhas, Yi, and Tai Le

Population of native speakers

Tibetan: more than 2 millions in China

Ladakh, Nepal, Bhutan, and north area of India bordering Tibet

Mongolian: more than 3 millions

Inner Mongolia & outer Mongolia

Uighur: more than 8 millions

Sinkiang

Language computing

Microsoft platform

Unscribe in Vista

Institute of Software Chinese Academy of Sciences

Red Flag & OpenOffice.org

Fund support from government

863 Hi-Tech Research and Development Program of China

Linux Operating System and Office Suite for Minority Scripts (2003AA1Z2110)

Knowledge Innovation Project sponsored by Chinese Academy of Sciences

Platform-Independent Tibetan Information Processing System Based on Linux (KGCX2-SW-504)

Electronic Product Development Fund sponsored Ministry of Information Industry

Cross-platform Tibetan Office suite

Language computing features

Writing Direction & formatting style

Uighur: bidirectional text

Right to left horizontally

Mongolian

From left to right vertically

Tibetan

From left to right horizontally

Special Line breaking behavior

Complex Script

Character shaping

The same character takes different shapes depending on the context

Ligature

Certain character sequences is rendered as one single shape

Character positioning

Grapheme & pre-composed character

Language computing features (Cont.)

Character set and encoding

In 1980's

Software provider take their own designed character set

Since 1997

Unicode standard

Tibetan 1997 (U+0F00~U+0FFFF)

Mongolian 2000 (U+1800~U+18AF)

Uighur use Arabic as its writing system

Tibetan pre-composed character set standard

Tibetan coded character set Extension A(GB/T20542-2006)

popular used Tibetan BrdaRten (Pre-composed character)

Tibetan coded character set Extension B

Devanagari transliteration of Tibetan

Non-BMP

Tibetan and Himalayan Digital Library

Jomolhari (Tibetan font)

Support in OpenOffice.org

Rendering

Smart font OpenType

GSUB

GPOS

Complex Layout Engine

Input

an array of Unicode characters in logical order

Output

an array of glyph indices

an array of character indices for the glyphs

an array of glyph positions

Vcl

ICU LayoutEngine (Linux)

Uniscribe (Windows)

Support in OpenOffice.org(Cont.)

Rendering

Mongolian support in ICU LayoutEngine

MonglianOpenTypeLayoutEngine

Mongolian text layout process goes through the following steps

Subdivide string of characters into runs

Further divide each run into clusters

Each cluster is labeled by feature tag

Apply feature tag information to each cluster

Support in OpenOffice.org(Cont.)

Rendering

Bypass Uniscribe on Windows

most minority language speakers are windows users

UniscribeLayout is replaced by IcuLayoutEngine

the Mongolian, Uighur and Tibetan can be rendered correctly

Support in OpenOffice.org(Cont.)

Vertical Text Formatting for Mongolian

Map the vertical text frame to horizontal text frame by rotation

Normal horizontal text formatting is performed

text frame is mapped back to its vertical origin

It's reversible the map between various direction frames

Support in OpenOffice.org(Cont.)

Vertical text formatting for Mongolian

Three functions in sw determine the location and exchange the width and height of the embedded frames

SwitchVerticalToHorizontal

SwitchHorizontalToVertical

SwapWidthAndHeight

Support in OpenOffice.org(Cont.)

Locale

ICU

I18n/l10n

Support in OpenOffice.org(Cont.)

GUI translation

Step 1: Add the New Language to the Resource System;

Step 2: Add the New Language to the Build Environment;

Step 3: Add the New Language to the Localization Tools;

Step 4: Extract Strings and Messages from the Source Code;

Step 5: Translate Extracted Strings and Messages to the New Language;

Step 6: Merge Translated Strings and Messages to Source Code;

Step 7: Add new language to the installation set project;

Step 8: Adding new language to the module helpcontent and readlicense_oo.

(http://l10n.openoffice.org/adding_language.html)

Demo

Demo (Cont.)

Demo (Cont.)

Problem & future work

Problems

It's impossible to benefit from developing software for minority language

Translation is not easy for Chinese minority language

No uniform glossary

No enough people mastering not only programming but also minority language

More fund support

Future work

more features

Transliteration & sorting

Training & application

Money

Work together with OpenOffice.org community

Strong collaboration with software provider

Conclusion

Minority language is an amazing world

The languages will be lost if they aren't saved

OpenOffice.org is much valuable for minority Language

OpenOffice.org should pay more attention on minority language

Welcome software corporations Keep an eye on Chinese minority Languages

SUN, Chinese 2000, Novell and so on

More developing document on OpenOffice.org

Establish Chinese minority language federation based on OpenOffice.org

Thanks!

Click to edit the outline text format

Second Outline Level

Third Outline Level

Fourth Outline Level

Fifth Outline Level

Sixth Outline Level

Seventh Outline Level

Eighth Outline Level

Ninth Outline Level

Click to edit the outline text format

Second Outline Level

Third Outline Level

Fourth Outline Level

Fifth Outline Level

Sixth Outline Level

Seventh Outline Level

Eighth Outline Level

Ninth Outline Level

Click to edit the title text format

Click to edit the outline text format

Second Outline Level

Third Outline Level

Fourth Outline Level

Fifth Outline Level

Sixth Outline Level

Seventh Outline Level

Eighth Outline Level

Ninth Outline Level