Pal gov.tutorial4.session13.arabicontology

32
1 PalGov © 2011 1 PalGov © 2011 فلسطينيةلكترونية الديمية الحكومة ا أكاThe Palestinian eGovernment Academy www.egovacademy.ps Tutorial 4: Ontology Engineering & Lexical Semantics Session 13 ArabicOntology Dr. Mustafa Jarrar University of Birzeit [email protected] www.jarrar.info

Transcript of Pal gov.tutorial4.session13.arabicontology

Page 1: Pal gov.tutorial4.session13.arabicontology

1PalGov © 2011 1PalGov © 2011

أكاديمية الحكومة اإللكترونية الفلسطينيةThe Palestinian eGovernment Academy

www.egovacademy.ps

Tutorial 4: Ontology Engineering & Lexical Semantics

Session 13

ArabicOntology

Dr. Mustafa Jarrar

University of Birzeit

[email protected]

www.jarrar.info

Page 2: Pal gov.tutorial4.session13.arabicontology

2PalGov © 2011 2PalGov © 2011

About

This tutorial is part of the PalGov project, funded by the TEMPUS IV program of the

Commission of the European Communities, grant agreement 511159-TEMPUS-1-

2010-1-PS-TEMPUS-JPHES. The project website: www.egovacademy.ps

University of Trento, Italy

University of Namur, Belgium

Vrije Universiteit Brussel, Belgium

TrueTrust, UK

Birzeit University, Palestine

(Coordinator )

Palestine Polytechnic University, Palestine

Palestine Technical University, PalestineUniversité de Savoie, France

Ministry of Local Government, Palestine

Ministry of Telecom and IT, Palestine

Ministry of Interior, Palestine

Project Consortium:

Coordinator:

Dr. Mustafa Jarrar

Birzeit University, P.O.Box 14- Birzeit, Palestine

Telfax:+972 2 2982935 [email protected]

Page 3: Pal gov.tutorial4.session13.arabicontology

3PalGov © 2011 3PalGov © 2011

© Copyright Notes

Everyone is encouraged to use this material, or part of it, but should

properly cite the project (logo and website), and the author of that part.

No part of this tutorial may be reproduced or modified in any form or by

any means, without prior written permission from the project, who have

the full copyrights on the material.

Attribution-NonCommercial-ShareAlike

CC-BY-NC-SA

This license lets others remix, tweak, and build upon your work non-

commercially, as long as they credit you and license their new creations

under the identical terms.

Page 4: Pal gov.tutorial4.session13.arabicontology

4PalGov © 2011

Tutorial Map

Topic Time

Session 1_1: The Need for Sharing Semantics 1.5

Session 1_2: What is an ontology 1.5

Session 2: Lab- Build a Population Ontology 3

Session 3: Lab- Build a BankCustomer Ontology 3

Session 4: Lab- Build a BankCustomer Ontology 3

Session 5: Lab- Ontology Tools 3

Session 6_1: Ontology Engineering Challenges 1.5

Session 6_2: Ontology Double Articulation 1.5

Session 7: Lab - Build a Legal-Person Ontology 3

Session 8_1: Ontology Modeling Challenges 1.5

Session 8_2: Stepwise Methodologies 1.5

Session 9: Lab - Build a Legal-Person Ontology 3

Session 10: Zinnar – The Palestinian eGovernmentInteroperability Framework

3

Session 11: Lab- Using Zinnar in web services 3

Session 12_1: Lexical Semantics and Multilingually 1.5

Session 12_2: WordNets 1.5

Session 13: ArabicOntology 3

Session 14: Lab-Using Linguistic Ontologies 3

Session 15: Lab-Using Linguistic Ontologies 3

Intended Learning ObjectivesA: Knowledge and Understanding

4a1: Demonstrate knowledge of what is an ontology,

how it is built, and what it is used for.

4a2: Demonstrate knowledge of ontology engineering

and evaluation.

4a3: Describe the difference between an ontology and a

schema, and an ontology and a dictionary.

4a4: Explain the concept of language ontologies, lexical

semantics and multilingualism.

B: Intellectual Skills

4b1: Develop quality ontologies.

4b2: Tackle ontology engineering challenges.

4b3: Develop multilingual ontologies.

4b4: Formulate quality glosses.

C: Professional and Practical Skills

4c1: Use ontology tools.

4c2: (Re)use existing Language ontologies.

D: General and Transferable Skills

d1: Working with team.

d2: Presenting and defending ideas.

d3: Use of creativity and innovation in problem solving.

d4: Develop communication skills and logical reasoning

abilities.

Page 5: Pal gov.tutorial4.session13.arabicontology

5PalGov © 2011 5PalGov © 2011

Session ILOs

This session will help student to:

4a4: Explain the concept of language ontologies, lexical

semantics and multilingualism.

4b4: Formulate quality glosses.

4b3: Develop multilingual ontologies.

Page 6: Pal gov.tutorial4.session13.arabicontology

6PalGov © 2011 6PalGov © 2011

Reading

Mustafa Jarrar: Building A Formal Arabic Ontology (Invited Paper) . In proceedings of

the Experts Meeting On Arabic Ontologies And Semantic Networks. Alecso, Arab League.

Tunis, July 26-28, 2011.Article http://www.jarrar.info/publications/J11.pdf.htm

Slides: http://mjarrar.blogspot.com/2011/08/building-formal-arabic-ontology-invited.html

Mustafa Jarrar: Towards The Notion Of Gloss, And The Adoption Of Linguistic

Resources In Formal Ontology Engineering. In proceedings of the 15th International World

Wide Web Conference (WWW2006). Edinburgh, Scotland. Pages 497-503. ACM Press. ISBN:

1595933239. May 2006.http://www.jarrar.info/publications/J06.pdf.htm

Aldo Gangemi , Nicola Guarino , Alessandro Oltramari , Ro Oltramari , Stefano Borgo:

Cleaning-up WordNet's Top-Level. In Proc. of the 1st

International WordNetConference (2002)

http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=C9962DFEDD793F3F839426B774BC

9BAF?doi=10.1.1.11.4064&rep=rep1&type=pdf

Page 7: Pal gov.tutorial4.session13.arabicontology

7PalGov © 2011 7PalGov © 2011

The Arabic Ontology Project

• A project started in 2010, at Birzeit University, Palestine.

• The ArabicOntology is more than an Arabic WordNet

• Unlike WordNet, the ArabicOntology is logically and philosophically well-

founded, as it follows strict ontological principles. but can be used an

Arabic WordNet.

http://sites.birzeit.edu/comp/ArabicOntology

The project is partially funded

(Seed funding) by Birzeit

University (VP academic

Office, Research Committee).

Page 8: Pal gov.tutorial4.session13.arabicontology

8PalGov © 2011 8PalGov © 2011

Arabic Ontology: Data Model (Simplified)

• ConceptID (as a synsetID in WordNet) to identify a concept.

• Polysemy and synonymy: like in WordNet, several words (i.e., lexical

units) can be used to lexicalize one concept (synonymy); and one word

might be used to lexicalize several concepts.

Gloss: describes a concept

Concept ID: concept unique reference

Lexical Unit

Semantic RelationsSemantic Relations

Page 9: Pal gov.tutorial4.session13.arabicontology

9PalGov © 2011 9PalGov © 2011

Lexical vs. Semantic Relationships

• Semantic relations are relationships between concepts (not words),

e.g., subtype, part-of, etc.

• Lexical relations are relationships between words (not concepts), e.g.,

synonym-of, root-of, abbreviation-of, etc.

• Ontologies are mainly concerned with semantic relations.

Gloss: describes a concept

Concept ID: concept unique reference

Semantic RelationsSemantic Relations

Lexical Unit

Page 10: Pal gov.tutorial4.session13.arabicontology

10PalGov © 2011 10PalGov © 2011

Arabic Ontology

• Arabic Ontology: the set of concepts (of all Arabic terms), and the

semantic (not lexical) relationships between these concepts.

• To build an Arabic Ontology: Identify the set of concepts for every

Arabic word (Polysemy), and define semantic relations between these

concepts.

• Most important relation is the subtype relation,

which leads to a (tree of concepts) .

Page 11: Pal gov.tutorial4.session13.arabicontology

11PalGov © 2011 11PalGov © 2011

Arabic Ontology: Subtype Relationships

• Subtype relation: is a mathematical relations (subset: A B ), such

that every instance in A must also be an instance of B.

• Inheritance: subtypes inherit all properties of their super types.

• “Hyponymy” in WordNet is close to (but not the same as) the subtype relation.

• “General-Specific” relations, as in thesauri, are not subtype relations.

world

1410

6 .

.. ..

.

..

.

.. ..

.

...

.

.

..

.

.

... ... .

. ... .. ..

.

....

. ....

..

..

..

.

.

.

..

.

.

3

4

Page 12: Pal gov.tutorial4.session13.arabicontology

12PalGov © 2011 12PalGov © 2011

Arabic Ontology: Subtype Relationships

• It is recommended to use proper subtypes, as it is more strict.

• That is, A and B are never equal, B is always a super set of A.

• It is recommended to classify concepts based on “rigidity”.

• For example it is wrong to say that a „WorkTable‟ is type of „Table‟.

as being a work table is a non-rigid property.

• As such, subtypes form a tree.

Page 13: Pal gov.tutorial4.session13.arabicontology

13PalGov © 2011 13PalGov © 2011

Arabic Ontology: Core (Top Levels).

Arabic Core Ontology: the top levels of the Arabic Ontology, - built

manually based on DOLCE and SUMO upper level ontologies, and

taking into account, carefully, the philosophical and historical aspects of

the Arabic concepts\terms.Top 3 levels shown here, for simplicity

• The 10th level of this core ontology should top all Arabic concepts and levels.

• This allow us to detect any problems in the tree/relations!

• The core Ontology governs the correctness and the evolution of the whole

Arabic Ontology.

10

lev

els

, 55

0 c

on

ce

pts

العربية الكلمات لجميع المعاني أمهات

Page 14: Pal gov.tutorial4.session13.arabicontology

14PalGov © 2011 14PalGov © 2011

Arabic Ontology: Glossesaccording to strict ontological guidelines[J06]

A gloss: is an auxiliary informal (but controlled) account of the intended

meaning of a linguistic term, for the commonsense perception of humans.

A gloss is supposed to render factual knowledge that is critical to understand a concept, but that

e.g. is implausible, unreasonable, or very difficult to formalize and/or articulate explicitly. (NOT) to

catalogue general information and comments, as e.g. conventional dictionaries and encyclopedias

usually do, or as <rdfs:comment>.

Page 15: Pal gov.tutorial4.session13.arabicontology

15PalGov © 2011 15PalGov © 2011

What should and what should not be provided in a gloss:

1. Start with the principal/super type of the concept being defined.

E.g. „Search engine‟: “A computer program that …”, „Invoice‟: “A business document that…”,

„University‟: “An institution of …”.

2. Written in a form of propositions, offering the reader inferential knowledge that help him to construct the image of the concept. E.g. Compare „Search engine‟:

“A computer program for searching the internet, it can be defined as one of the most useful aspects

of the World Wide Web. Some of the major ones are Google, ….”;

A computer program that enables users to search and retrieves documents or data from a database

or from a computer network…”.

Arabic Ontology: Gloss Guidelines

3. Focus on distinguishing characteristics and intrinsic prosperities that

differentiate the concept out of other concepts.

E.g. Compare, „Laptop computer‟:

“A computer that is designed to do pretty much anything a desktop computer can do, it runs for a

short time (usually two to five hours) on batteries”.

“A portable computer small enough to use in your lap…”.

Page 16: Pal gov.tutorial4.session13.arabicontology

16PalGov © 2011 16PalGov © 2011

4. Use supportive examples :

- To clarify cases that are commonly known to be false but they are true, or

that are known to be true but they are false;

- To strengthen and illustrate distinguishing characteristics (e.g. define by

examples, counter-examples).

Examples can be types and/or instances of the concept being defined.

5. Be consistent with formal definitions/axioms.

6. Be sufficient, clear, and easy to understand.

Arabic Ontology: Gloss Guidelines

WordNet glosses do not follow such ontological guidelines

Page 17: Pal gov.tutorial4.session13.arabicontology

17PalGov © 2011 17PalGov © 2011

Arabic Ontology: Gloss Guidelines

As a gloss starts with a supertype of concept being defined, try to read

the gloss as the following, to verify what you do is correct:

.بياناث مكىنت من صفىف وأعمدة مصفىفت: جدول

.وأعمدةبياناث جنبا الى جنب على شكل صفىف ترتيب: جدول

.وأعمدةبياناث بصىرة ممنهجت جنبا الى جنب على شكل صفىف تنظيم: جدول

Page 18: Pal gov.tutorial4.session13.arabicontology

18PalGov © 2011 18PalGov © 2011

ArabicOntology Vs WordNet

Unlike WordNet, the Arabic Ontology is:

1. Philosophically well founded:

• Focuses on intrinsic properties;

• All types are rigid;

• The top level is derived from known Top Level Ontologies.

2. Strictly formal:

• Semantic relations are well-defined mathematical relations.

3. Strictly-controlled glosses

• The content and structure of the glosses is strictly based on

ontological principles.

Page 19: Pal gov.tutorial4.session13.arabicontology

19PalGov © 2011 19PalGov © 2011

Methodology and Progress

Page 20: Pal gov.tutorial4.session13.arabicontology

20PalGov © 2011 20PalGov © 2011

Our Approach to Building the

ArabicOntology

Step1:

Mine Arabic concepts/glosses from dictionaries.

Step 2:

Automatically map between these Arabic concepts and WordNet

concepts, thus inherit semantic relations from WordNet.

Step 3:

Link all concepts with the Arabic Core Ontology.

Step 4:

Re-formulate these glosses, according to strict ontological guidelines.

Roughly:

Page 21: Pal gov.tutorial4.session13.arabicontology

21PalGov © 2011 21PalGov © 2011

Step1-Mining Arabic Concepts from

Dictionaries

Mining

concepts

• Collect as much glosses/concepts as possible from specialized and general

dictionaries.

• Manual extraction from dictionaries, then basic cleaning done automatically.

• 35k glosses ready.

• We have ~100 students typing dictionaries now!

• +100K more glosses (expected this year)

Page 22: Pal gov.tutorial4.session13.arabicontology

22PalGov © 2011 22PalGov © 2011

Step1-Mining Arabic Concepts from

Dictionaries

Mining

concepts

• Collect as much glosses/concepts as possible from specialized and general

dictionaries.

• Manual extraction from dictionaries, then basic cleaning done automatically.

Page 23: Pal gov.tutorial4.session13.arabicontology

23PalGov © 2011 23PalGov © 2011

Step1-Mining Arabic Concepts from

Dictionaries

Mining

concepts

• Most Arabic dictionaries are not useful, but some are a good start.

The dictionaries we need should:

Focus on the semantic aspects.

Multiple meanings are not mixed up.

Structure of quality of the meaning.

Page 24: Pal gov.tutorial4.session13.arabicontology

24PalGov © 2011 24PalGov © 2011

Examples (Good & Bad Resources)

Wiktionary

معجم مصطلح األصول

والمتوارد المترادف

بلدانمعجم ال

الحاسبات معجم

معجم تعريف مصطلحات القانون الخاص

أقرب الموارد

اإلسالمي المعجم

معجم األلفاظ المشتركة في اللغة العربية

زالمعجم الوجيز

Page 25: Pal gov.tutorial4.session13.arabicontology

25PalGov © 2011 25PalGov © 2011

Step2: Map Arabic concepts to WordNet

(Matching Function)

We developed a smart algorithm, such that:

Input: (Arabic gloss, 117k English glosses in WordNet).

Output: (best match, rank)

Accuracy: +90% (being improved)WordNet (English)

The territory occupied by one of the constituent

administrative districts of a nation

The way something is with respect to its main

attributes

The group of people comprising the government

of a sovereign state

A politically organized body of people under a

single government

A compilation of the known facts regarding

something or someone

….

بلد لها حدود معروفة وشعب

مة سات منظ وفيها حكومة ومؤس

A politically organized

body of people under a

single government

Page 26: Pal gov.tutorial4.session13.arabicontology

26PalGov © 2011 26PalGov © 2011

The Matching Function is used for:

1- Based on the previous mapping, we can inherit Semantic Relations

from WordNet.

L

B

QR

D

WordNet Concepts

A

C

Arabic Concepts

J

H

2- Same function is used to detect redundant concepts, within the

Arabic Ontology itself.

Remark: This is only a good start, as these inherited relations need to be

cleaned using the Arabic Top Levels, and using the OnToClean Methodology.

Page 27: Pal gov.tutorial4.session13.arabicontology

27PalGov © 2011 27PalGov © 2011

Step 3: Link concepts with the Arabic Core

Ontology

Each Arabic concept (from previous steps) is mapped to a concept in the

10th level.

That is, the 10th level of this core ontology should top all Arabic concepts

and levels, so to enable automatic detection of problems in the hierarchy.

Top 3 levels shown here, for simplicity

A CJ

J

Page 28: Pal gov.tutorial4.session13.arabicontology

28PalGov © 2011 28PalGov © 2011

Until this stage

We have many concepts extracted from linguistic resources, but the

glosses are not well-written!

We have many possible subtype relations between concepts, derived via

the mappings to WordNet concepts.

We have a sample of 6000 Arabic concepts mapped to the 10th level in the

Core ontology.

We need to:

Clean the glosses,

Clean/correct the subtype links.

Page 29: Pal gov.tutorial4.session13.arabicontology

29PalGov © 2011 29PalGov © 2011

Automatic Detection of Inconsistencies

If (J A) and (A لغوي اصطالح ) then it‟s most likely true that (J اصطالح

(لغوي , thus no need to have (J لغوي اصطالح ).

However, as H and C don‟t share a supertype, (H C) is likely incorrect.

Top 3 levels shown here, for simplicity

A CJ

H

X!

Subtype links from Arabic concepts to the core ontology (done manually)

Subtypes links between Arabic concepts (derived via the mappings to WordNet)

Now we can automatically detect whether the links are correct?

X!

Page 30: Pal gov.tutorial4.session13.arabicontology

30PalGov © 2011 30PalGov © 2011

Step 4- Re-Formulate Glosses,according to strict ontological guidelines[J06]

Glosses are re-formulated semi-manually, to meet our strict rules.

Gloss-cleaning can be done automatically to a certain point.

While the manual-cleaning (=re-formulating) glosses, mistakes in

subtype relation can be detected.

Page 31: Pal gov.tutorial4.session13.arabicontology

31PalGov © 2011 31PalGov © 2011

Further Research (ongoing)

Given many Arabic-English, Arabic-French, Arabic-Italian dictionaries

Can we derive an Arabic-Arabic thesaurus? For example:

جدول: مصفوفة، نهر، قائمة، قناة ماء

Then Categorize very-related words (maybe using WordNet) as the

following:

جدول: مصفوفة، قائمة، نهر، قناة ماء

This will help finding possible Arabic synsets, which help detecting

possible subtype relations and/or validate the existing relations.

Page 32: Pal gov.tutorial4.session13.arabicontology

32PalGov © 2011 32PalGov © 2011

References

Mustafa Jarrar: Building A Formal Arabic Ontology (Invited Paper) . In proceedings of the Experts Meeting On Arabic Ontologies

And Semantic Networks. Alecso, Arab League. Tunis, July 26-28, 2011.Article http://www.jarrar.info/publications/J11.pdf.htm

Slides: http://mjarrar.blogspot.com/2011/08/building-formal-arabic-ontology-invited.html

Mustafa Jarrar: Towards The Notion Of Gloss, And The Adoption Of Linguistic Resources In Formal Ontology

Engineering. In proceedings of the 15th International World Wide Web Conference (WWW2006). Edinburgh, Scotland. Pages 497

503. ACM Press. ISBN: 1595933239. May 2006.http://www.jarrar.info/publications/J06.pdf.htm

[MBC93] George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller: Introduction to WordNet:

An On-line Lexical Database. International Journal of Lexicography, Vol. 3, Nr. 4. Pages 235-244. (1990)

http://wordnetcode.princeton.edu/5papers.pdf

[GGO02] Aldo Gangemi , Nicola Guarino , Alessandro Oltramari , Ro Oltramari , Stefano Borgo: Cleaning-up WordNet's Top-

Level. In Proc. of the 1st International WordNetConference (2002)

http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=C9962DFEDD793F3F839426B774BC9BAF?doi=10.1.1.11.4064&rep=rep1

&type=pdf

Roche Christophe, Calberg-Challot Marie (2010): “Synonymy in Terminology: the Contribution of Ontoterminology”, Re-

thinking synonymy: semantic sameness and similarity in languages and their description, Helsinki, 2010http://www.linguistics.fi/synonymy/Synonymy%20Ontoterminology%20Helsinki%202010.pdf

Roche Christophe, Calberg-Challot Marie, Damas Luc, Rouard Philippe (2009): “Ontoterminology: A new paradigm for

terminology”. KEOD, Madeirahttp://ontology.univ-savoie.fr/condillac/files/docs/articles/Ontoterminology-a-new-paradigm-for-terminology.pdf