SLE2015: Distributed ATL

Distributed Model-to-Model Transformation with ATL on MapReduce

Jordi CABOTICREA

Universitat Oberta de Catalunya

Amine BENELALLAM, Abel GOMEZ, and Massimo TISI

AtlanMod team (Inria, Mines Nantes, Lina)

The 8th ACM SIGPLAN International Conference on Software Language Engineering (co-located with SPLASH), Oct 26 2015, Pittsburgh, USA

Context

Model Transformation

Transformation spec { S::Square →T1::Triangle S::Circle → T1::Octagon ....}

Source Models

1

2 5

4 63

Target Model

1

2 5

4 63

Consumes Produces

Con

sum

es

Why Distributing Model Transformations ?

>:(

Scalability issues in MTs

Complex Transformations taking hours to run

Very Large Models (VLMs) not fitting into a memory of a single machine

● Frequent increase in scope between releases

● +900 Meta-Classes & thousands of properties

● Models go up to Gbs

Increasing complexity of data & systems

Distributing Model Transformation

Consumes Produces

Consumes Produces

Distributed Environment

Transformation spec

Source Model

1

2

5

4

6

3

Target Model

1

2 5

4 63

Why not using GPL ?

Using a General Purpose Language (GPL) for distributed MT:

1. Required familiarity with concurrency theory

○ not common among MDE application developers

2. New class of errors w.r.t. sequential programming

○ e.g. linked to task synchronization and shared data access

3. Complex analysis for performance optimization

--MEETs-->

Meet ATL-MR

Case Study: Analysis of Data-Flow in Java Programs (TTC13 [1])

[1] T. Horn. The TTC 2013 Flowgraphs Case. arXiv preprint, arXiv:1312.0341, 2013.

Case Study: Analysis of Data-Flow in Java Programs

int fact (int a) { int r = 1;

while (a>0) { r *= a--; } return r;

}

int fact (int a)

int r = 1;

while (a>0)

r *= a--;

return r;

a

r

int fact(int a)

int r = 1;

while (a>0)

r *= a--;

return r;

(a) Java code (c) Data-Flow(b) Control-Flow

def use cfNext/dfNext

Atlanmod Transformation Language (ATL)

module ControlFlow2DataFlow;

create OUT : DataFlow from IN : ControlFlow;

rule SimpleStatment {

from s : ControlFlow!SimpleStmt ( not ( s.def−>isEmpty( ) and s.use−> isEmpty ( ) ) )

to t : DataFlow!SimpleStmt ( txt <− s.txt , dfNext <− s.computeNextDataFlows ( ) )

}[...]

ModuleRule

Input pattern

Output pattern

guard

bindingATL helper

ATL Helper

helper Context ControlFlow!FlowInstr def :computeNextDataFLows() : Sequence (ControlFlow!FlowInstr) = self.def ->collect(d | self.users(d) ->reject(fi | if fi = self then not fi.isInALoop else false endif ) ->select(fi | thisModule.isDefinedBy(fi,Sequence{fi},self, Sequence{}, self.definers(d)->excluding( self)))) ->flatten(); helper def : isDefinedBy(start : ControlFlow!FlowInstr, input : Sequence(ControlFlow!FlowInstr), end : ControlFlow!FlowInstr, visited :Sequence(ControlFlow!FlowInstr), forbidden : Sequence(ControlFlow!FlowInstr)) : Boolean = if input->exists(i | i = end) then true else let newInput : Sequence(ControlFlow!FlowInstr) = input ->collect(i |i.cfPrev) ->flatten() ->reject(i | visited ->exists(v | v = i) or forbidden ->exists(f| f = i)) in if newInput ->isEmpty() then false else thisModule.isDefinedBy(start, newInput, end, visited->union(newInput)->asSet() ->asSequence(), forbidden) endif endif;

ATL Execution Semantic: Match phase

int fact (int a)

int r = 1;

while (a>0)

r *= a--;

return r;

a

r

rule:Method

rule:Stmnt

rule:Stmnt

rule:Stmnt

rule:Stmnt

ATL Execution Semantic: Apply phase

int fact (int a)

int r = 1;

while (a>0)

r *= a--;

return r;

a

r

rule:Method

rule:Stmnt

rule:Stmnt

rule:Stmnt

rule:Stmnt


int fact (int a)

int r = 1;

while (a>0)

r *= a--;

return r;

a

r

rule:Method int fact(int a)

rule:Stmnt

rule:Stmnt

rule:Stmnt

rule:Stmnt


int fact (int a)

int r = 1;

while (a>0)

r *= a--;

return r;

a

r

rule:Method

rule:Stmnt

rule:Stmnt

rule:Stmnt

rule:Stmnt

int fact(int a)


int fact (int a)

int r = 1;

while (a>0)

r *= a--;

return r;

a

r

rule:Method

rule:Stmnt

rule:Stmnt

rule:Stmnt

rule:Stmnt

int fact(int a)

int r = 1;

while (a>0)

r *= a--;

return r;

MapReduce

Log0Record

Log1

Log2map1

Log3

Log4

Log5map2

Log6

Log7

Log8

map3

<+,1><+,1>

<*,1>

SP

LIT1S

PLIT2

SP

LIT3

<X,1><+,1>

<*,1>

<X,1><*,1>

<+,1>

shuffle/sort

<+,1><+,1><+,1><+,1>

<*,1><*,1><*,1><X,1>

red1

red2

<X,1>

<*,3><X,2>

<+,4>

Map phase Reduce phase

Why MapReduce for ATL?

● Well-suited for Write Once Read Many (WORM) data

● Two-phased execution model

Also MapReduce:

● Supports different types of inputs (XML, DB, Text)

● Handles machine failures, efficient communication, and performance issues

ATL & MapReduce Alignment

Semantics Alignment

Reduce

read traces

global resolve

Map

read modelsubset

create trace

properties

local match/ apply

save model

match apply map reduce

Control-Flow to Data-Flow in MapReduce: Local Match/Apply

int fact (int a)

int r = 1;

while (a>0)

r *= a--;

return r;

a

r

rule:Method

rule:Stmnt

rule:Stmnt

rule:Stmnt

rule:Stmnt

int r = 1;

while (a>0)

r *= a--;

return r;

map1

map2


int fact (int a)

int r = 1;

while (a>0)

r *= a--;

return r;

a

r

rule:Method

rule:Stmnt

rule:Stmnt

rule:Stmnt

rule:Stmnt

int fact(int a)

int r = 1;

while (a>0)

r *= a--;

return r;

map1

map2


int fact (int a)

int r = 1;

while (a>0)

r *= a--;

return r;

a

r

rule:Method

rule:Stmnt

rule:Stmnt

rule:Stmnt

rule:Stmnt

int fact(int a)

int r = 1;

while (a>0)

r *= a--;

return r;

map1

map2

dfNext

Control-Flow to Data-Flow in MapReduce: Global Resolve

int fact (int a)

int r = 1;

while (a>0)

r *= a--;

return r;

a

r

rule:Method

rule:Stmnt

rule:Stmnt

rule:Stmnt

rule:Stmnt

int fact(int a)

int r = 1;

while (a>0)

r *= a--;

return r;

red1

red2

dfNext


int fact (int a)

int r = 1;

while (a>0)

r *= a--;

return r;

a

r

rule:Method

rule:Stmnt

rule:Stmnt

rule:Stmnt

rule:Stmnt

int fact(int a)

int r = 1;

while (a>0)

r *= a--;

return r;

red1

red2

int fact (int a)

int r = 1;

while (a>0)

r *= a--;

return r;

a

r

rule:Method

rule:Stmnt

rule:Stmnt

rule:Stmnt

rule:Stmnt

int fact(int a)

int r = 1;

while (a>0)

r *= a--;

return r;


red1

red2

Extended Tracing Model

ATL-MR in Action

Hadoop Distributed File System (HDFS)

objectUID_5objectUID_6objectUID_7objectUID_8

map2

objectUID_1objectUID_2objectUID_3objectUID_4 map1

load transformation data

<rule2,traceUID5><rule1,traceUID6><rule1,traceUID7><rule2,traceUID8>


shuffle/sort


<rule1,traceUID1><rule1,traceUID4><rule1,traceUID6><rule1,traceUID7> red1

red2

save traces and partial models

LMA mode1 GR mode(2)

load traces and partial models

save models

[1] LMA: Local Match/Apply[2] GR: Global Resolve[3] ATL-MR: https://github.com/atlanmod/ATL_MR

https://github.com/atlanmod/ATL_MR

Evaluation

Experiment I: Speed-up Curve

● 5 models extracted from automatically generated Java files:

○ similar size (~1500 LOCs)○ sequential transformation ranges from

620s to 778s

● Run on identical set of machines (m1.large) over Amazon Elastic MapReduce (EMR)

○ 10 times for each number of nodes○ 280 hours of computation

● Almost linear speed-up up to 8 nodes

○ ~3 times faster on 8 nodes

Experiment II: Size/Speed-Up Correlation

● 5 models extracted from automatically generated Java files:

○ increasing size (13.500 to 105.000 LOCs)○ sequential transformation ranges from 319s to

17 998s (~4h)● Run on a cluster of 12 instances built on top of

OpenVC○ 8 slaves ○ 4 machines orchestrating Hadoop/Hbase

● Almost-linear speed-up for large models○ Up to 6X faster on 8 nodes

● Speed-up increases with model size

Challenges

Challenges In Distributing Model Transformation

Fact II: Persistence backends are not suited for R/W concurrency

Rule applications might not have the same complexity

Unable to parallelize the reduce phase

Unable to guarantee a balanced workload, MapReduce default scheduler is not enough

Fact I: Models might densely interconnected &

unbalanced

NeoEMF an Extensible Persistence Backend ● Lazy loading and unloading

○ enabling transformation of big

models

● Distributed storage and access

○ permitting the parallelization of the

reduce phase

● Compliant with MapReduce

● Fail-safe (no data loss)

ModelManager

PersistenceManager

PersistenceBackend

NeoEMF/Map

EMF

/Graph

Model-based Tools

CachingStrategy

Model Access API

Persistence API

Backend API

ClientCode

/HBase

HBase ZooKeeperGraphDB MapDB

[1] NeoEMF: http://www.neoemf.com

http://www.neoemf.com

Future Work

1. Optimization of load balancing

○ efficient distribution of the input model over map workers

2. Parallelization of the Global Resolve phase and the transformation of Very Large Models

○ integrating ATL-MR with NeoEMF/HBase

Conclusion

● We align Rule-based Model Transformation with the MapReduce execution model

○ We introduce an execution semantics of ATL on top of MapReduce

○ We experimentally show the good scalability of our solution

● For ATL users: Keep the same syntax and embrace the Cloud

● For MapReduce users: Model Transformation as yet another high-level language for MapReduce

Check us out on Github

https://github.com/atlanmod/ATL_MR

Questions

SLE2015: Distributed ATL

Software

Transcript of SLE2015: Distributed ATL