ISOcat and RELcat, two cooperating semantic registries

21
ww.isocat.org ISOcat and RELcat: 2 cooperating Semantic Registries Menzo Windhouwer [email protected] The Language Archive – DANS Ineke Schuurman [email protected] KU Leuven, CLARIN-NL – Utrecht University 17 January 2014 1 CLIN 24

description

M. Windhouwer, I. Schuurman. ISOcat and RELcat, two cooperating semantic registries. At the 24th Meeting of Computational Linguistics in the Netherlands (CLIN 24), Leiden, The Netherlands, January 17, 2014.

Transcript of ISOcat and RELcat, two cooperating semantic registries

Page 1: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

ISOcat and RELcat:2 cooperating Semantic Registries

Menzo [email protected]

The Language Archive – DANS

Ineke [email protected]

KU Leuven, CLARIN-NL – Utrecht University17 January 2014 1CLIN 24

Page 2: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Outline

• The need for explicit semantics– ISOcat

• Mapping issues– Languages, theoretical frameworks– Granularity levels– RELcat

• CGN case study• Conclusions and future work

17 January 2014 CLIN 24 2

ccl
naast 'theoretical frameworks' ook languages toegevoegdtypo verbeterd in theoretical
Page 3: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Typological Database Nijmegen

TOP NOTION tds:Noun GROUPS{ NOTION tdn:GrammaticalDistinctions LABEL "Grammatical distinctions for nouns." GROUPS { NOTION tdn:AgentNouns LABEL "Agent nouns." DESCRIPTION "Nouns can function as the agent of a clause." LINK TO CONCEPT agentRole GROUPS { NOTION tdn:v098_plusAffix LABEL "Agent nouns formed by verb stem plus affix." LINK TO CONCEPTS (agentRole, verbalMorphology, boundAffix) DESCRIPTION <p>Agent nouns are formed by a verb stem plus an affix, e.g. English <qv>walk-er</qv>.</p> NOTE AUTHOR IS "TDS" TYPE IS "original TDN label" "AGENT NOUNS ARE VERB STEM PLUS AFFIX" IS FIELD v098;...

Explicit semantics!

17 January 2014 CLIN 24 3Notes: TDN is not in archived in TLA, but curated in TDS, a previous project Menzo worked on, and now archived at DANS;also this not a TDN punchcard

Page 4: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

17 January 2014 CLIN 24 4

DOBES corpora

Explicit semantics!

Shared semantics!

Page 5: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

ISOcat

• An open Data Category/Concept Registry where everyone can– find and select data categories/concepts– create new data categories/concepts– share data categories/concepts

• Each data category/concept has a Persistent Identifier which can be embedded in a resource (schema) to make the intended semantics (more) explicit

17 January 2014 CLIN 24 5

ccl
hoofdletters verwijders bij eerste bulletsmv naar ev bij tweede bullet (was mix)
Page 6: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Mapping issues

• Interesting resources for a specific research question might– use very different theoretical frameworks, which

might share few/none data categories/concepts– use more coarse or finer grained data

categories/concepts• How to overcome these differences by

mapping data categories/concepts to each other?

17 January 2014 CLIN 24 6

ccl
eachother => each other
Page 7: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Some examples

• definite article (PoS)– EN: 1 (-)– FR: 2 (masc, fem)– NL: 2 (neuter, non-neuter) – DE: 3 (masc, fem, neuter)

Dutch ‘non-neuter’ , for example, should be related to ‘masc’ and ‘fem’

17 January 2014 CLIN 24 7

Page 8: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Some examples

• Indirect object (syntax)– EN: indirect object– NL: • meewerkend voorwerp (1), or• meewerkend voorwerp (2) plus belanghebbend

voorwerp – All translated as ‘indirect object’

=> 3 definitions of ‘indirect object’, relations are to be shown !

17 January 2014 CLIN 24 8

ccl
synt nu voluit
Page 9: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Some examples

• Event (semantics)– ISO-TimeML: event and state, where ‘state’ is a

type of event

– Other theories (Kamp & Reyle etc): eventuality, two subtypes: ‘event’ and ‘state’

Concepts ‘eventuality’, ‘event’ and ‘state’ are to be related

17 January 2014 CLIN 24 9

Page 10: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

ISOcat internal issues

Data categories that are almost the same, apart from type, profile, language, …

Currently we insert a new DC. But note that the original one and the new one should be marked as having a same-as relation

17 January 2014 CLIN 24 10

Page 11: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

RELcat

• A Relation Registry (under construction) to store– (almost) same-as relationships– subsumption relationships (isSuperClassOf,

isSubClassOf)– mereology relationships (isPartOf, hasPart)– …

between data categories/concepts• The focus is on informal and possibly partial

ontologies to be used for resource discovery• Based on RDF triples17 January 2014 CLIN 24 11

Page 12: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

CGN case study

• Atomic building blocks of CGN tags are defined in ISOcat (still private)

• The EBNF schema of a CGN tag is stored in SCHEMAcat

• The subsumption relations in the value domains are stored in RELcat

• (almost) same-as relationships with other data categories/concepts are also stored in RELcat

17 January 2014 CLIN 24 12

Page 13: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

CGN granularity mappings

• How to deal with (almost) same-as relationships that involve more then one atomic CGN data category/concept?– Example: N(SOORT) = Common Noun

• Based on the CGN EBNF this involves the following slots of the /CGN tag/– /PoS/ = /N/– /NTYPE/ = /SOORT/

• How to express this in RDF?17 January 2014 CLIN 24 13

Page 14: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

RELcat RDF mapping

• Data categories/concepts can function as subjects and objects in an RDF triple

• The predicate of an RDF triple is a RELcat relationship type

• Alternative: complex data categories as properties

17 January 2014 CLIN 24 14

Page 15: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org N(SOORT) = Common Noun

17 January 2014 CLIN 24 15

Common Noun

CGN tag

sameAs

isA

Page 16: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org N(SOORT) = Common Noun

17 January 2014 CLIN 24 16

Common Noun

PoS NTYPE

N SOORT

sameAshasPotentialValue hasPotentialValue

CGN tag

isA

hasPart hasPart has more parts

has morepotential values

has morepotential values

Page 17: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org N(SOORT) = Common Noun

17 January 2014 CLIN 24 17

Common Noun

PoS NTYPE

N SOORT

sameAs

hasPart hasPart

isAisA

isA isA

hasValuehasValuehasPotentialValue hasPotentialValue

CGN tag

isA

hasPart hasPart has more parts

has morepotential values

has morepotential values

Page 18: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org N(SOORT) = Common Noun

17 January 2014 CLIN 24 18

Common Noun

PoS NTYPE

N SOORT

sameAs

hasPart hasPart

isAisA

isA isA

hasValuehasValuehasPotentialValue hasPotentialValue

CGN tag

isA

hasPart hasPart has more parts

has morepotential values

has morepotential values

Page 19: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org Cooperation between ISOcat and RELcat

• ISOcat: value domains of closed data categories– RELcat: hasPotentialValue (new relationship type)

• ISOcat: is-a relations between simple data categories– RELcat: subsumption relations

• SCHEMAcat: part-of relationships– RELcat: mereology relationships

17 January 2014 CLIN 24 19

Page 20: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Conclusions and future work

• Simple mappings are easy• Complex mapping get easily fairly complex– UI support?– DSL support?– Alternative RDF mapping?

• User front-end for RELcat– Integration of RELcat and ISOcat?

17 January 2014 CLIN 24 20

Page 21: ISOcat and RELcat, two cooperating semantic registries

www.isocat.org

Other examples

17 January 2014 CLIN 24 21

• “JJR” -> “POS=adjective & degree=comparative”• “Transitive” -> “thetavp=vp120 & synvps=[synNP]

& caseAssigner=True”• “VVIMP” -> “POS= verb & main verb &

mood=imperative”