The Art and Science of DDS Data Modelling

75
PrismTech The Art and Science of DDS Data Modelling Angelo Corsaro, PhD Chief Technology Ocer OMG DDS SIG Co-Chair [email protected]

description

The Data Distribution Service (DDS) is a standard for ubiquitous, interoperable, secure, platform independent, and real-time data sharing across network connected devices. DDS is today used in a large class of applications, such as, Power Generation, Large Scale SCADA, Air Traffic Control and Management, Smart Cities, Smart Grids, Vehicles, Medical Devices, Simulation, Aerospace, Defense and Financial Trading. Differently from traditional message-centric technologies, DDS is data-centric – the accent is on seamless (user-defined) data sharing as opposed to message delivery. Therefore, when embracing DDS and data-centricity, data modeling becomes a key step in the design of a distributed system. This webcast will (1) explain the role and scope of data modeling in DDS, (2) introduce the techniques at the foundation of effective and extensible Data Models, and (3) summarize the most common DDS Data Modeling Idioms.

Transcript of The Art and Science of DDS Data Modelling

Page 2: The Art and Science of DDS Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

PrismTech

A Recurring Question

• People new to DDS recurrently ask a question: what are the techniques and patterns that we can use to design DDS-based Systems?

• My answer is usually: Start with the powerful tools and techniques provided by relational data modelling and then add some DDS-specific spice

• I’ve come to the conclusion that many people are not very familiar with relational data modelling, or perhaps it is way too long that they have studied/reviewed these concepts

• This webcast, will provide a relatively well introduction to the relational data model

Page 3: The Art and Science of DDS Data Modelling

The Relational Model

Page 7: The Art and Science of DDS Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

PrismTech

Tuples• An instance of a relation is a set of tuples (records) in which each tuple has the same

number of fields as in the relation schema.

• A relation’s instance can be visualised as table where each tuple is a row and all rows have the same number of fields (columns)

!

!

!

!

• Notice that rows are all different. This is a requirement of the relational model, as a relation instance is a collection of unique tuples (or rows)

sid name age gpa

1234 Peter Parker 21 4.02345 Tony Stark 15 4.03456 Bruce Wayne 23 3.5

Page 11: The Art and Science of DDS Data Modelling

Quick DDS Intro

Page 14: The Art and Science of DDS Data Modelling

Information Definition

Page 16: The Art and Science of DDS Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

PrismTech

Topic and Instances• As explained in the previous slide a topic defines a class/type of information

• Topics can be defined as Singleton or can have multiple Instances

• Topic Instances are identified by means of the topic key

• A Topic Key is identified by a tuple of attributes -- like in databases

• Remarks: - A Singleton topic has a single domain-wide instance

- A “regular” Topic can have as many instances as the number of different key values, e.g., if the key is an 8-bit character then the topic can have 256 different instances

Page 21: The Art and Science of DDS Data Modelling

UML Data Modelling

Page 30: The Art and Science of DDS Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

PrismTech

SubclassesThree ways of mapping subclassing to the relational model

T1 Subclass relations contain the superclass key and the specialised attributes

T2 Subclass relations contain all attributes

T3 One relation containing all superclass and subclass attributes

T1 A(K, X), B(K, Y), C(K, Z)

T2 A(K, X), B(K, X, Y), C(K, X, Z)

T3 A(K, X, Y, Z)

The best translation may depend on the the context, e.g. T3 good for heavily overlapping subclasses, T2 good for disjoint and complete subclasses

K: PKX

A

YB

ZC

Page 33: The Art and Science of DDS Data Modelling

Refinement

Page 34: The Art and Science of DDS Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

PrismTech

Why Relation Refinement?

• The UML/ER Data Models provide usually a good starting point toward the data model that we’ll actually use in the system

• The relations implied by the UML/ER Data Model often need to be normalised and re-organised to address performances and workload criteri

• The goal of relation refinements is to remove redundancy and/or decompose a relation with smaller relations

• Normal forms provide a way of measuring the amount of redundancy that may be in our data model

Page 35: The Art and Science of DDS Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

PrismTech

Redundancy

• Redundant Storage: Information may be stored multiple times leading to space, and perhaps time, inefficiencies

• Update Anomalies: If one copy of the redundant information is update this may create inconsistencies in other copies — unless all copies are updated at the same time

• Insertion Anomalies: It may not be possible to store some information, unless some other information is stored as well

• Deletion Anomalies: It may not be possible to delete some information without loosing som other information as well

Page 36: The Art and Science of DDS Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

PrismTech

Decomposition

• Unconsidered decomposition can lead more problems than benefits, thus when decomposing you always want to ensure that: - You really need to decompose the relation

- You fully understand the implications of the decomposition (lossless join, dependency preservation)

• Normal Forms provide good guidelines for relations decompositions as they guarantees that certain class of problems cannot be introduced

• Notice that decomposition can have a performance impact as it may lead to an increase in joins

Page 37: The Art and Science of DDS Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

PrismTech

Functional Dependencies• A Functional Dependency (FD) is a kind of Integrity Constraint (IC) that

generalises the concept of a key

• Given a relation R along with two nonempty sets of attributes X and Y in R, we say that R satisfies the FD X ⟶ Y if the following holds for every pair of tuples t1 and t2 in R:

!

• In other terms, the FD says that if two tuple agree on the set of attributes on X they also agree on the set of attributes in Y

• Notice that a primary key constraint is a special kind of FD

if t1.X = t2.X then t1.Y = t2.Y

Page 39: The Art and Science of DDS Data Modelling

Normal Forms

Page 40: The Art and Science of DDS Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

PrismTech

Normal Forms• Different Normal Forms (NF) exist that provide guidance on how to decompose

relations

• If a relation is in a given normal form then we are guarantees that some anomalies cannot arise, e.g. update anomaly, etc.

• The normal forms based on functional dependencies are the first normal form (1FN), second normal form (2FN), third normal form (3NF) and the Boyce-Codd normal form (BCNF)

• Every relation in BCNF is also in 3NF, every relation in 3FN is also in 2FN and finally every relation in 2NF is also in 1NF

• The 2NF and 3NF have only historical interest, while the BCNF has important practical applicability

Page 42: The Art and Science of DDS Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

PrismTech

Boyce-Codd Normal Form (BCNF)Let R be a relation, X a subset of attributes of R and a an attribute of R. R is in Boyce-Codd Normal Form (BCNF) if for every FD: X ⟶ {a} that holds over R, one of the following is true:

• a ∊ X, that is it is a trivial FD, or

• X is a superkey

!

Intuitively, in a BCNF relation the only nontrivial dependencies are those in which a key determines some attributes. Each attribute must describe the key, the whole key, and nothing but the key

key attr 1 attr 2 attr k

Functional Dependencies in BCNF

Page 49: The Art and Science of DDS Data Modelling

Relational Algebra

Page 55: The Art and Science of DDS Data Modelling

Back to DDS

Page 59: The Art and Science of DDS Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

PrismTech

Frequency Mix

• Suppose you have a relation R(K, X,Y) were the set of attributes X changes far more frequently than the set of attributes Y (e.g. position, vs. velocity)

• In this case you should decompose the relation R into:

!

!

• This will reduce the resource usage in your system, e.g. bandwidth as well as CPU but may introduce consistency issues. If consistency is essential then coherent updates should be used to atomically update R1 and R2

R1(K, X), R2(K, Y)

Page 62: The Art and Science of DDS Data Modelling

Summing Up

Page 63: The Art and Science of DDS Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

PrismTech

Concluding Remarks• The relational model provides the right set of tools for designing DDS-based

systems

• DDS Topics are relations and DDS supports a subset of relational algebra to manipulate these relations (topics)

• The design process is as follows: - Start modelling your system using the UML Data Modelling subset

- Ensure your model is in BCNF or 4NF — make sure your understand why some violations are necessary/desirable for your system

- Add QoS to your relations

- Evaluate if further decomposition is required due to QoS mixes — if your data model is properly normalised

Page 64: The Art and Science of DDS Data Modelling

Learn More…

Page 67: The Art and Science of DDS Data Modelling

Extras

Page 68: The Art and Science of DDS Data Modelling

ER Modelling

Page 70: The Art and Science of DDS Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

PrismTech

Entities, Attributes and Entity Sets

• An entity is an object in the real world that is distinguishable from other objects - e.g. the iPhone, the Samsumg Galaxy Note, etc.

• An entity is described through a set of attributes

• An entity set identifies a collections of similar entities - e.g., Mobile Phones

• Each attribute associated with an entity set must identify its domain

• An entity has a primary key and potentially several candidate keys