Post on 26-Dec-2015
XML Object Database 1
A Logic Programming Approach to A Logic Programming Approach to Supporting the Entries of XML Documents in Supporting the Entries of XML Documents in
an Object Databasean Object Database
Ching-Long Yeh 葉 慶 隆Department of Computer Science and Engineering
Tatung University
Taipei 104, Taiwanchingyeh@cse.ttu.edu.tw
XML Object Database 2
IntroductionIntroduction
• XML improves upon HTML in – capturing the meaning of a document and – extending the tag set.
• At the same time, it also reduces the complexity of SGML.
• It is believed that XML will soon be the standard of data exchanges on the Web.
XML Object Database 3
IntroductionIntroduction
• Due to lack of indices in files, we are not able to make full use of the meaning (or metadata) in an XML document, if it is stored in a file.
• Since an XML document can be easily viewed according to the object-oriented model, a promising solution is to employ object database technology to manage the access of XML documents.
XML Object Database 4
IntroductionIntroduction
• In this talk, I will present our research in– the design and implementation of an XML object
DB, and– an extensible template-based query interface to
accessing to XML object database
XML Object Database 5
The Remainder of the TalkThe Remainder of the Talk
• An Introduction to XML• Design and Implementation of an XML Object
Database• An Extensible Template-based Interface
XML Object Database 6
An Introduction to XMLAn Introduction to XML
XML Object Database 7
HyperText Markup LanguageHyperText Markup Language
• HTML is a language used to create hyperlink text in the WWW.
• The text is presented according to a set of predefined tags.
• The definition of tags is based on the Document Type Definition (DTD) of SGML.
• In other words, HTML is an application of SGML in the WWW.
XML Object Database 8
• Central to SGML is the concept that documents have structurestructure, contentcontent, and formatformat.
• These three ingredients combine to form a document.
Standard Generalized Markup LanguageStandard Generalized Markup Language
XML Object Database 9
What is Content?What is Content?
• Content is the actual data within a document.• The words and illustrations that make up a bicycle
assembly manual are its contents.
XML Object Database 10
What is Format?What is Format?
• Format consists of how the words, sentences, and paragraphs are visually presentedvisually presented and distinguished from one another within a document.– Boldface for title, italics for special terms, and blank lines
between sections are examples of document formats.
• People often confuse format with structure.People often confuse format with structure.
XML Object Database 11
What is Structure?What is Structure?
Coconut Pudding
12 ounces coconut milk
4 to 6 tablespoons sugar
4 to 6 tablespoons cornstarch
3/4 cup water
Pour coconut milk into saucepan.
Combine sugar and cornstarch; stir in waterand blend well.
Stir sugar mixture into coconut milk; cook and stir over low heat until thickened.
Recipe
Title
IngredientList
Ingredient
InstructionList
Step
XML Object Database 12
Document Type DefinitionDocument Type Definition
• Defining the structures in XML/SGML– The structure of a document its type is defined by a
document type definition, or DTD.– The DTD lays out the rules for a document through the use
of elements, attributes, and entities.
XML Object Database 13
Document Type DefinitionDocument Type Definition
<!ELEMENT recipe -- ( title, ingredientList, instructionList)><!ELEMENT title -- (#PCDATA)><!ELEMENT ingredientList -- (ingredient*)><!ELEMENT instructionList -- (step*)><!ELEMENT ingredient -- (#PCDATA) ><!ELEMENT step -- (#PCDATA)>
• A DTD looks like
XML Object Database 14
Document InstanceDocument Instance
<!DOCTYPE RECIPE PUBLIC ”recipe" ”recipe"><RECIPE><TITLE>Coconut Pudding</TITLE><INGREDIENTLIST> <INGREDIENT> 12 ounces coconut milk</INGREDIENT> <INGREDIENT> 4 to 6 tablespoons sugar </INGREDIENT> <INGREDIENT> 4 to 6 tablespoons cornstarch </INGREDIENT> <INGREDIENT> 3/4 cup water </INGREDIENT><INGREDIENTLIST><INSTRUCTIONLIST>` <STEP> Pour coconut milk into saucepan. </STEP> <STEP>Combine sugar and cornstarch; stir in water and blend well. </STEP> <STEP>Stir sugar mixture into coconut milk; cook and stir over low heat until thickened. </STEP> …</INSTRUCTIONLIST></RECIPE>
XML Object Database 15
HTML, SGML, XMLHTML, SGML, XML
• HTML helped establish the Internet by providing a universal way to present information.
• However, HTML only addresses the presentation of data.
• Using SGML, user can add structure along with the content of a document.
• However, SGML has proven too heavy-weight for the Internet.
XML Object Database 16
Extensible Markup LanguageExtensible Markup Language
• The XML is a simple dialect of SGML.• HTML is sufficient for sending web pages that are
viewed by human beings.• XML, however, adds the tags that enable computers
to understand, act on or process the information.• XML has been designed for ease of implementation
and for interoperability with both SGML and HTML.
XML Object Database 17
XML Application ProfileXML Application Profile
• Electronic commerce• Electronic data interchange (EDI)• Fine-grain content publishing• Internet search engines• Distributed application design• etc.
XML Object Database 18
Data Type Requirements of DocumentsData Type Requirements of Documents
• HTML– One file per page– Simple uni-directional linking
• XML– Tens, hundreds or even thousands of objects per page– Multiple DTDs– Hierarchical structure and rich linking– Query and navigation capabilities required– Agents and business rules interact with the data
XML Object Database 19
Data Types of Storage Data Types of Storage
• File system– Store monolithic stuff.– Folder system on top of them– Good at storing multimedia data
XML Object Database 20
Data Types of StorageData Types of Storage
• Relational database– Tabular in nature– Good at storing rows and columns of data like
spreadsheets and data from forms like invoices.
XML Object Database 21
Data Types of StorageData Types of Storage
• Object-oriented database– Good at managing structured, hierarchical rich
linked information.– That’s exactly what XML is.– XML is the object representation of data.
XML Object Database 22
Design and Implementation of an Design and Implementation of an XML Object DatabaseXML Object Database
XML Object Database 23
Basic IdeaBasic Idea
• The arrangement of elements in an XML document is governed by the element and attribute list declarations in document type definition.
• The creation of DTD in a sense is closely related to defining new data types and hierarchical relationship in an object database.
• Thus, to enter an XML document into an object database, at first a new schema corresponding to a DTD is generated in the object database, and then the document conforming to that DTD is fragmented into objects and entered into the database.
XML Object Database 24
Basic IdeaBasic Idea
• Both the tasks of creating a schema in object database for a DTD and fragmenting XML documents into objects can be divided into two parts: analysis and generation. – For the former task, an input DTD is analyzed according to
the formation rules specified in the XML recommendation, and the schema definitions are produced for the structures found in the analysis of DTD.
– The other task is to analyze XML document instances and produce object definitions for the elements found in them.
XML Object Database 25
Basic IdeaBasic Idea
• We employ the definite clause grammar (DCG) in Prolog as a tool to implement the analysis and generation tasks.
• The basic idea is to encode the analysis task in the context-free rule part and the generation task in the action part of the DCG rules.
XML Object Database 26
Strucuture Document DatabaseStrucuture Document Database
• Combine structured document with OODB technology:– VERSO project at INRIA– News-On-Demand Application – Document Database from GMD-IPSI
• XML document database products:– The Poet XML Repository– eXcelon, ODI– Ardent Sofiware, Inc
XML Object Database 27
DTDparser
Schemagenerator
DI parsergenerator
DI parserand objectgenerator
DTD
DI
Schema definitions
Object definitions
XML repository
DBlanguageprocessor
produces
Schemageneration
rules
XMLrules
Parsergeneration
rules
structure
Userinterface
System ArchitectureSystem Architecture
XML Object Database 28
DTD ParserDTD Parser
elementdecl::= ’<!ELEMENT S Name S contentspec S? ‘>’
elementdecl(contentModel(N,C))--> elementPrefix, name(N), contentSpec(C), rightAngle.
contentspec::=‘EMPTY’| ‘ANY’| Mixed | children
contentSpec(C)-->empty,{C=’EMPTY’};any,{C={ANY’};mixed(C);children(C).
XML Object Database 29
Parsing ResultParsing Result<!ELEMENT top (p,spec,div1)><!ELEMENT p (#PCDATA|a|ul|b|i|em)*><!ELEMENT spec (front,body, back?)*><!ELEMENT div1 (head,(p|list1 |note)*, div2*)><!ELEMENT name (#PCDATA)><!ELEMENT a (#PCDATA)><!ELEMENT ul (#PCDATA)><!ELEMENT b (#PCDATA)><!ELEMENT i (#PCDATA)><!ELEMENT em (#PCDATA)><!ELEMENT front (#PCDATA)><!ELEMENT body (#PCDATA)><!ELEMENT back (#PCDATA)><!ELEMENT head (#PCDATA)><!ELEMENT list1 (#PCDATA)><!ELEMENT note (#PCDATA)><!ELEMENT div2 (#PCDATA)>
[contentModel(top,seq([p/null,spec /null,div1/null])/null),
contentModel(p,mixed([pcdata,a,ul,b,i,em])),
contentModel(spec,seq([front/null,body/null,back/question])/star),
contentModel(div1,seq([head/null,alt([p/null,list1/null,note/null]) /star,div2/star])/null),
contentModel(name,pcdata), contentModel(a,pcdata), contentModel(ul,pcdata), contentModel(b,pcdata), contentModel(i,pcdata), contentModel(em,pcdata), contentModel(front,pcdata), contentModel(body,pcdata), contentModel(back,pcdata), contentModel(head,pcdata), contentModel(list1,pcdata), contentModel(note,pcdata), contentModel(div2,pcdata)]
XML Object Database 30
Schema GenerationSchema Generation
defineClass 'Top' super: SingleSeq{ instance: 'P' 'p'; 'Spec' 'spec'; 'Div1' 'div1';};defineClass 'P' super: Mixed{ instance: List<Mixedp> mixedp;};defineClass Mixedp super: SingleAlt{ instance: String pcdata; 'A' 'a'; 'Ul' 'ul'; 'B' 'b'; 'I' 'i'; 'Em' 'em';};defineClass 'Spec' super: MultiSeq{ instance: List<Seqspec> seqspec;};
defineClass 'Seqspec' super: SingleSeq{ instance: 'Front' 'front'; 'Body' 'body'; 'Back' 'back';};defineClass 'Div1' super: SingleSeq{ instance: 'Head' 'head'; List<Alt1> 'alt1'; List<Div2> 'div2';};defineClass 'Alt1' super: SingleAlt{ instance: 'P' 'p'; 'List1' 'list1'; 'Note' 'note';};defineClass 'Name' super: Unstructured{ instance: String pcdata;}; ...
XML Object Database 31
DI ParserDI Parsertop(V) --> stg(top), p(P),spec(Spec),div1(Div1), etg(top).p(V) --> stg(p), mixedp(Mixedp),etg(p).mixedp(V) --> (pcdata(Pcdata); a(A);ul(Ul);b(B); I(I); em(Em);{false}), mixedp(_); [].spec(V) --> stg(spec), spec1(Spec), etg(spec).spec1(V) --> front(Front), body(Body), (back(Back);[ ]), spec1(_); [].
div1(V) --> stg(div1), head(Head), alt1(Alt1), div21(Div21), etg(div1).alt1(V) --> (p(P); list1(List1); note(Note); {false}), alt1(_); [].div21(V) --> div2(Div2), div21(_) ; [].name(V) --> stg(name), pcdata(Pcdata), etg(name).
XML Object Database 32
DI Parser GenerationDI Parser Generation
Rule_Head --> Start_Tag, Rule_Body, End_Tag, {Semantic Actions}.
for each contentModel(ElementName,ContentStructure) do generate the rule head for ElementName; generate the start tag for ElementName; generate the rule body for ContentStructure; generate the end tag for ElementName; generate the semantic action;
XML Object Database 33
ImplementationImplementation
• We have built a prototype of the system using LPA Win-Prolog V3.5 on personal computer.
• It consists of a DTD parser, Schema generator and DI parser generator.
• After creating the physical store and class family for XML documents, we can proceed to build the database schema for DTD by executing the ODQL codes generated by the DTD schema generator.
XML Object Database 34
XML Object Database 35
XML Object Database 36
An Extensible Query-By-Template Interface An Extensible Query-By-Template Interface to Accessing XML Document Databaseto Accessing XML Document Database
XML Object Database 37
MotivationMotivation
• Vastness of search results on current WWW search engines
• Textual-based query language with a simple English-like syntax is inconvenient for the user.
• Current user interfaces primarily use form-based queries.
XML Object Database 38
GoalGoal
• The goal is to design a convenient interface for user to access XML document without knowing the knowledge of the document types.
• The interface will relieve user from typing complex query language.
• The interface should be web-based and platform-independent.
XML Object Database 39
System ArchitectureSystem Architecture
Visual Query Interface
XML Object Database 40
Visual Query FacilityVisual Query Facility
• Query By Example (QBE)– The interface is composed of tabular skeletons representing
tables in the database.
• Query By Forms (QBF)– The interface is presented with a list of searchable fields,
each with an entry area that can be used to indicate the search string.
• Query By Template (QBT)– The interface is displayed a template for a representative
entry of the database. User express their queries by indicating the search keywords in the appropriate regions of the template.
XML Object Database 41
Example of Image-based QBTExample of Image-based QBT
XML Object Database 42
Limits of Image-based QBTLimits of Image-based QBT
• The image template is divided into regions, each of which corresponds to an element in the document structure.
• Associated with each regions is the query action. • Its significant drawback is the lack of flexibility in the
template creation.• It is difficult to automate the task of reconfiguration of
query action associate with the new template.• A single interface template for all types of document
is probably not a good idea.
XML Object Database 43
Concept of eXtensible QBT (XQBT)Concept of eXtensible QBT (XQBT)
• The environment provides a template creator which consists of a DTD schema browser and a scene for presentation design.
• The environment aims at providing automatic configuration of query actions associated with presentation of template.
• The design of the template presentation must be tightly coupled with the arrangement of document data stored in the repository.
• The component in the design of presentation must be properly associated with corresponding nodes in the object database schema.
XML Object Database 44
Environment for XQBTEnvironment for XQBT
XML Object Database 45
Template CreatorTemplate Creator
• The template creator consists of a DTD schema browser a scene for template draft, and functional area.
• The template creator in mainly relied on a DTD schema browser, which corresponds to the database schema.
• The scene is a visual display area where the designer can organize a template draft for certain purpose.
• The content of template draft is exported to a file, which contains the template presentation and additional information.
XML Object Database 46
Template CreatorTemplate Creator
Functional areaFunctional Area
XML Object Database 47
Exported FileExported File
• The file contains the information about the template presentation property associate with each element.
• Each element is appended with the path information in the database schema, in order that the template executor, which can make use of the information to carry out query actions.
XML Object Database 48
Template ExecutorTemplate Executor
• The template executor loads the exported file and presents the template as was originally designed in the template creator.
• The path of each node in the DTD schema browser is used to carry out the query action required by the user.
XML Object Database 49
XQBTQBT
Comparison between Image-based QBT and XQBTComparison between Image-based QBT and XQBT
• The template is an image by taking a photograph or by scanning from existing pages.
• The query action associate with each region is hand-coded.
• Either planar or nested template is limited to region level that is not very deep.
• The template is generated for a representative document.
• The associated query action can be generated automatically for the interface program.
• The designer can change the template to meet the requirement of various region level.
XML Object Database 50
ImplementationImplementation
• Java Proxies (Jp) for Jasmine– Jp allows developer to build their application in J-API, and ta
ke advantage of Jasmine class libraries.
XML Object Database 51
The interface for our XML document databaseThe interface for our XML document database
ingredient
Ingredientname
Ingredientstep
XML Object Database 52
Query FormulationQuery Formulation
– Such searches are performed by simply entering the search string in the corresponding region of the template.
XML Object Database 53
Query Formulation (cont.)Query Formulation (cont.)
XML Object Database 54
Query Formulation (cont.)Query Formulation (cont.)
• The multiple condition are specified in different regions which are combined using logical conjunctions(such as AND, OR, NOT).
• The approach used to derive the logical expression
from its graphical representation is using the default precedence.
• User can insert parentheses as necessary in the
condition box, which used in QBE interface.
XML Object Database 55
The results of the query formulationThe results of the query formulation
XML Object Database 56
Template CreatorTemplate Creator
XML Object Database 57
Template ExecutorTemplate Executor
XML Object Database 58
ConclusionConclusion
• We employ the DCG in Prolog to translate XML documents into the schema and object definitions of an object database.
• The features of backtracking in Prolog and the CFG formalism in the DCG are useful that we can construct the parsers easily by expressing the rules in the XML specification in DCG.
• We need not worry about providing information to choose which production rule to use as the recursive-descent parser does.
• Similarly, because of the features we can easily generate the DCG rules for the DI parser as the result of the parsing process.
XML Object Database 59
Future WorkFuture Work
• Looking for possible applications of the XML database– Electronic commerce– Intelligent multiagent system– Knowledge management system
• Performance evaluation• Efficient query interfaces