Post on 12-Apr-2017
Inferring VersionedSchemas from NoSQLDatabases and its
ApplicationsER’15
Stockholm, October 2015
[{ ”id”: ”90234 af”, ”value”: { ”author”: ”Diego Sevilla Ruiz”,”e-mail”: ”dsevilla@um.es”,”institution”: ”U. of Murcia”}},
{ ”id”: ”a243bb5”, ”value”: { ”author”: ”Severino Feliciano Morales”,”e-mail”: ”severino.feliciano@um.es”,”institution”: ”U. of Murcia”}},
{ ”id”: ”096705d”, ”value”: { ”author”: ”Jesús García Molina”,”e-mail”: ”jmolina@um.es”,”institution”: ”U. of Murcia”}}]
Motivation
NoSQL Databases are Schemaless
Benefits▶ No need to previouslydefine an Schema
▶ Non-uniform data▶ Custom fields▶ Non-uniform types
▶ Easier evolution
Drawbacks▶ Harder to reason aboutthe DB
▶ Static checking is lost▶ Some of the data logic isin the application code(more error prone)
▶ Some utilities needSchema information towork
Schemas for NoSQL Databases
▶ How to alleviate the problems of schemalessdatabases? ⇒ Inferring a Schema
▶ The Schema Model contains information aboutEntities and Relationships
▶ Take into account the different Entity Versions inthe Database
▶ Heterogeneity usually because of slight variations onEntities
▶ We obtain a precise database model▶ The Schema allows us to automate the constructionof tools:
▶ migration, refactoring, visualization, …
Related Work▶ JSON Schema
▶ Object versions and relationships are not considered▶ Apache Spark SQL/Drill: SQL-like schemas
▶ Union of all fields, nullable ⇒ incorrect combinations▶ Over-generalization to String▶ Aggregations and Reference relations not considered
▶ MongoDB-Schema▶ Prototype to infer schemas from MongoDBcollections
▶ Same limitations than Spark SQL▶ JSON Discoverer
▶ A MDE solution to infer domain models from RESTweb services (i.e. JSON documents)
▶ Not database-oriented; Object versions notconsidered
Spark SQL Example
{”name”:”Michael”}{”name”:”Andy”, ”age”:30}{”name”:”Justin”, ”age”:19}{”name”:”Peter”, ”age”:”tiny”}{”name”:”Martina”, ”address”:”home!”}
> people.printSchemaroot|-- address: string (nullable = true)|-- age: string (nullable = true)|-- name: string (nullable = true)
▶ age promoted to string▶ age and address are never part of the same object
{”rows”:[{”content”:{”chapters”:33,”pages”:527
},”authors”:[{”company”:{”country”:”USA”,”name”:”IBM”},
”name”:”Grady Booch”,”_id”:”210”
},{”company”:{”country”:”USA”,”name”:”IBM”
},”name”:”James Rumbaugh”,”_id”:”310”
},{”country”:”USA”,”company”:”Ivar Jacobson Consulting”,”name”:”Ivar Jacobson”,”_id”:”410”
}],”type”:”book”,”year”:2013,”publisher_id”:”345679”,”title”:”The Unified Modeling Language”,”_id”:”1”
},{”discipline”:”software engineering”,”issn”:[”0098 -5589”,”1939 -3520”
],”name”:”IEEE Trans. on Software Engineering”,”type”:”journal”,”_id”:”11”
},
{”name”:”Automated Software Engineering”,”issn”:[”0928 -8910”,”1573 -7535”
],”discipline”:”software engineering”,”type”:”journal”,”_id”:”12”,”number”:10515
},{”city”:”Barcelona”,”name”:”Omega”,”type”:”publisher”,”_id”:”123451”
},{”type”:”publisher”,”city”:”Newton”,”name”:”O’Reilly Media”,”_id”:”928672”
},{”type”:”book”,”author”:{”_id”:”101”,”name”:”Bradley Holt”,”company”:{
”country”:”USA”,”name”:”IBM Cloudant”,
}},”title”:”Writing and Querying MapReduce Views in
CouchDB”,”publisher_id”:”928672”,”_id”:”2”
},{”name”:”Addison -Wesley”,”type”:”publisher”,”_id”:”345679”
},{”type”:”publisher”,”journals”:[”11”,”12”
],”name”:”IEEE Publications”,”_id”:”907863”
}]}
NoSQL Database Model▶ Objects (Entities) and Entity Versions
▶ Attributes▶ Relationships
▶ Aggregation▶ References
{”type”:”publisher”,”city”:”Newton”,”name”:”O’Reilly Media”,”_id”:”928672”
},{”type”:”book”,”author”:{”_id”:”101”,”name”:”Bradley Holt”,”company”:{
”country”:”USA”,”name”:”IBM Cloudant”,
}},”title”:”Writing and Querying MapReduce Views in CouchDB”,”publisher_id”:”928672”,”_id”:”2”
},
Schema & Entity Versions DescriptionEntity Publisher {
Version 1 {
name: String
city: String
}
Version 2 {
name: String
}
Version 3 {
name: String
journal[+]: [Ref]->[Journal] (opposite=False)
}
}
Entity Journal {
Version 1 {
issn: Tuple [String, String]
name: String
discipline: String
}
Version 2 {
issn: Tuple [String, String]
name: String
discipline: String
number: int
}
}
Entity Book {
Version 1 {
title: String
year: int
publisher[1]: [Ref]->[Publisher] (opossite=False)
content[1]: [Aggregate]Content1
author[+]: [Aggregate]Author1
}
Version 2 {
title: String
publisher[1]: [Ref]->[Publisher] (opossite=False)
author[1]: [Aggregate]Author1
}
}
Entity Author {
Version 1 {
name: String
company[1]: [Aggregate]Company
}
Version 2 {
country: String
company: String
name: String
}
}
Entity Company {
Version 1 {
name: String
country: String
}
}
Entity Content {
Version 1 {
chapters: int
pages: int
}
}
(a) (b)
[1..1] company
[1..1] publisher[1..1] content[1..*] authors
[1..*] journals
Solution Design Considerations
▶ We have to process all the objects in the Database⇒Map-Reduce
▶ Natural data processing on NoSQL databases▶ LeverageMDE technologies
▶ Reuse EMF/Ecore tooling to show entity diagrams▶ Automation & Code Generation byMetamodeling &Model Transformations
Proposed MDE Architecture
NoSQLDatabase
MapReduceObjectVersions(JSON)
JSONInjection
JSONModel
JSONMetamodel
SchemaReverseEng
SchemaModel
ApplicationGeneration
SchemaViewer/Data
Validator/MigrationAssistant
Applications SchemaMetamodel
instan
ce
instance
Reverse Engineering Process (i)
▶ Map-Reduce process▶ Map: obtains the Raw Schema for each object▶ Reduce: selects an archetype for each Entity Version▶ Entity Type
▶ Root objects ⇒ “type” field or collection name▶ Aggregated objects ⇒ key of the pair (e.g. “author”)
JSON object Raw Schema
{name:“Omega”, city:“Barcelona”} {name:String, city:String}
{title:“Writing and...”,publisher_id:“928672”,author:{name:“Bradley Holt”,
company:{country:“USA”,name:“IBM Cloudant”} } }
{title:String,publisher_id:String,author:{name:String,
company:{country:String,name:String} } }
Reverse Engineering Process (ii)
▶ Attributes: primitive or tuple▶ Aggregated Entities
▶ Value of the pair is an Object (or array of objects)▶ Entity type inferred from the key
▶ References▶ Heuristics/Conventions▶ Key: <entity_name>_id▶ Value: MongoDB’s DBRef abstraction:{”$ref”: ”<entity_name>”, ”$id”, <id_value>}
▶ Honor cardinalities (arrays)
Example NoSQL Applications
▶ From the DBSchema model, using ModelTransformations and Model-to-Text transformations(Code Generation), we can:
▶ Generate models that Characterize each EntityVersion
▶ That characterization can be used to Visualize theDatabase
▶ And also to generate code to Validate objectsentering the Database
▶ Generate models that allow Database Migration tothe desired Entity Versions
Type Discrimination/CharacterizationMetamodel
function isOfExactTypeBook_2(obj) {if (! (”type” in obj)) {
return false;}if (obj[type] !== ”Book”) {
return false;}if (! (”title” in obj)) {
return false;}if (! (”author” in obj)) {
return false;}if (”publisher” in obj) {
return false;}if (”content” in obj) {
return false;}if (”year” in obj) {
return false;}
return true;}
Generated using a Model-to-Text transformationfrom an instance of theprevious Type Discrimina-tion Metamodel
Entity Versions
Alternate: D3.js Treemap
Type Transformation Metamodel
db.<collection >. update(<query >,<update >,{
multi: true}
)
Obtained by Entity Type Characterization
Generate the correct updateMongoDB statement using $set,$push, etc., maybe via user assis-tance through a DSL.For example, for Journal_1 toJournal_2:
$set: { ”number”: 1 }
Conclusions & Future work
▶ A process for obtaining Conceptual Model Schemasfor NoSQL Databases is shown
▶ The process takes into account the different EntityVersions present in the Database
▶ AMDE process allows us to automate theproduction of several applications from the Schemas
▶ Example applications that allow DatabaseVisualization andMigration are shown
Conclusions & Future work (ii)
▶ Future work includes:▶ Building a NoSQL Database Tool Set (NoSQL DataEngineering)
▶ DSL for Entity Version migration▶ Refining the Schema to allow a richer Type System
▶ Allow value ranges or enumerated sets▶ Infer attribute dependencies (derived attributes,
i.e. the value of an attribute dictates the value ofanother attribute)
▶ etc.