SLE2015: Distributed ATL
-
Upload
amine-benelallam -
Category
Software
-
view
893 -
download
0
Transcript of SLE2015: Distributed ATL
Distributed Model-to-Model Transformation with ATL on MapReduce
Jordi CABOTICREA
Universitat Oberta de Catalunya
Amine BENELALLAM, Abel GOMEZ, and Massimo TISI
AtlanMod team (Inria, Mines Nantes, Lina)
The 8th ACM SIGPLAN International Conference on Software Language Engineering (co-located with SPLASH), Oct 26 2015, Pittsburgh, USA
Model Transformation
Transformation spec { S::Square →T1::Triangle S::Circle → T1::Octagon ....}
Source Models
1
2 5
4 63
Target Model
1
2 5
4 63
Consumes Produces
Con
sum
es
Scalability issues in MTs
Complex Transformations taking hours to run
Very Large Models (VLMs) not fitting into a memory of a single machine
● Frequent increase in scope between releases
● +900 Meta-Classes & thousands of properties
● Models go up to Gbs
Increasing complexity of data & systems
Distributing Model Transformation
Consumes Produces
Consumes Produces
Distributed Environment
Transformation spec
Source Model
1
2
5
4
6
3
Target Model
1
2 5
4 63
Why not using GPL ?
Using a General Purpose Language (GPL) for distributed MT:
1. Required familiarity with concurrency theory
○ not common among MDE application developers
2. New class of errors w.r.t. sequential programming
○ e.g. linked to task synchronization and shared data access
3. Complex analysis for performance optimization
Case Study: Analysis of Data-Flow in Java Programs (TTC13 [1])
[1] T. Horn. The TTC 2013 Flowgraphs Case. arXiv preprint, arXiv:1312.0341, 2013.
Case Study: Analysis of Data-Flow in Java Programs
int fact (int a) { int r = 1;
while (a>0) { r *= a--; } return r;
}
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
(a) Java code (c) Data-Flow(b) Control-Flow
def use cfNext/dfNext
Atlanmod Transformation Language (ATL)
module ControlFlow2DataFlow;
create OUT : DataFlow from IN : ControlFlow;
rule SimpleStatment {
from s : ControlFlow!SimpleStmt ( not ( s.def−>isEmpty( ) and s.use−> isEmpty ( ) ) )
to t : DataFlow!SimpleStmt ( txt <− s.txt , dfNext <− s.computeNextDataFlows ( ) )
}[...]
ModuleRule
Input pattern
Output pattern
guard
bindingATL helper
ATL Helper
helper Context ControlFlow!FlowInstr def :computeNextDataFLows() : Sequence (ControlFlow!FlowInstr) = self.def ->collect(d | self.users(d) ->reject(fi | if fi = self then not fi.isInALoop else false endif ) ->select(fi | thisModule.isDefinedBy(fi,Sequence{fi},self, Sequence{}, self.definers(d)->excluding( self)))) ->flatten(); helper def : isDefinedBy(start : ControlFlow!FlowInstr, input : Sequence(ControlFlow!FlowInstr), end : ControlFlow!FlowInstr, visited :Sequence(ControlFlow!FlowInstr), forbidden : Sequence(ControlFlow!FlowInstr)) : Boolean = if input->exists(i | i = end) then true else let newInput : Sequence(ControlFlow!FlowInstr) = input ->collect(i |i.cfPrev) ->flatten() ->reject(i | visited ->exists(v | v = i) or forbidden ->exists(f| f = i)) in if newInput ->isEmpty() then false else thisModule.isDefinedBy(start, newInput, end, visited->union(newInput)->asSet() ->asSequence(), forbidden) endif endif;
ATL Execution Semantic: Match phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
ATL Execution Semantic: Apply phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
ATL Execution Semantic: Apply phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method int fact(int a)
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
ATL Execution Semantic: Apply phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
ATL Execution Semantic: Apply phase
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
MapReduce
Log0Record
Log1
Log2map1
Log3
Log4
Log5map2
Log6
Log7
Log8
map3
<+,1><+,1>
<*,1>
SP
LIT1S
PLIT2
SP
LIT3
<X,1><+,1>
<*,1>
<X,1><*,1>
<+,1>
shuffle/sort
<+,1><+,1><+,1><+,1>
<*,1><*,1><*,1><X,1>
red1
red2
<X,1>
<*,3><X,2>
<+,4>
Map phase Reduce phase
Why MapReduce for ATL?
● Well-suited for Write Once Read Many (WORM) data
● Two-phased execution model
Also MapReduce:
● Supports different types of inputs (XML, DB, Text)
● Handles machine failures, efficient communication, and performance issues
Semantics Alignment
Reduce
read traces
global resolve
Map
read modelsubset
create trace
properties
local match/ apply
save model
match apply map reduce
Control-Flow to Data-Flow in MapReduce: Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
Control-Flow to Data-Flow in MapReduce: Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
Control-Flow to Data-Flow in MapReduce: Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
Control-Flow to Data-Flow in MapReduce: Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
dfNext
Control-Flow to Data-Flow in MapReduce: Local Match/Apply
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
map1
map2
dfNext
Control-Flow to Data-Flow in MapReduce: Global Resolve
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
red1
red2
dfNext
Control-Flow to Data-Flow in MapReduce: Global Resolve
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
red1
red2
int fact (int a)
int r = 1;
while (a>0)
r *= a--;
return r;
a
r
rule:Method
rule:Stmnt
rule:Stmnt
rule:Stmnt
rule:Stmnt
int fact(int a)
int r = 1;
while (a>0)
r *= a--;
return r;
Control-Flow to Data-Flow in MapReduce: Global Resolve
red1
red2
ATL-MR in Action
Hadoop Distributed File System (HDFS)
objectUID_5objectUID_6objectUID_7objectUID_8
map2
objectUID_1objectUID_2objectUID_3objectUID_4 map1
load transformation data
<rule2,traceUID5><rule1,traceUID6><rule1,traceUID7><rule2,traceUID8>
<rule1,traceUID1><rule2,traceUID2><rule2,traceUID3><rule1,traceUID4>
shuffle/sort
<rule2,traceUID2><rule2,traceUID3><rule2,traceUID5><rule2,traceUID8>
<rule1,traceUID1><rule1,traceUID4><rule1,traceUID6><rule1,traceUID7> red1
red2
save traces and partial models
LMA mode1 GR mode(2)
load traces and partial models
save models
[1] LMA: Local Match/Apply[2] GR: Global Resolve[3] ATL-MR: https://github.com/atlanmod/ATL_MR
Experiment I: Speed-up Curve
● 5 models extracted from automatically generated Java files:
○ similar size (~1500 LOCs)○ sequential transformation ranges from
620s to 778s
● Run on identical set of machines (m1.large) over Amazon Elastic MapReduce (EMR)
○ 10 times for each number of nodes○ 280 hours of computation
● Almost linear speed-up up to 8 nodes
○ ~3 times faster on 8 nodes
Experiment II: Size/Speed-Up Correlation
● 5 models extracted from automatically generated Java files:
○ increasing size (13.500 to 105.000 LOCs)○ sequential transformation ranges from 319s to
17 998s (~4h)● Run on a cluster of 12 instances built on top of
OpenVC○ 8 slaves ○ 4 machines orchestrating Hadoop/Hbase
● Almost-linear speed-up for large models○ Up to 6X faster on 8 nodes
● Speed-up increases with model size
Challenges In Distributing Model Transformation
Fact II: Persistence backends are not suited for R/W concurrency
Rule applications might not have the same complexity
Unable to parallelize the reduce phase
Unable to guarantee a balanced workload, MapReduce default scheduler is not enough
Fact I: Models might densely interconnected &
unbalanced
NeoEMF an Extensible Persistence Backend ● Lazy loading and unloading
○ enabling transformation of big
models
● Distributed storage and access
○ permitting the parallelization of the
reduce phase
● Compliant with MapReduce
● Fail-safe (no data loss)
ModelManager
PersistenceManager
PersistenceBackend
NeoEMF/Map
EMF
/Graph
Model-based Tools
CachingStrategy
Model Access API
Persistence API
Backend API
ClientCode
/HBase
HBase ZooKeeperGraphDB MapDB
[1] NeoEMF: http://www.neoemf.com
Future Work
1. Optimization of load balancing
○ efficient distribution of the input model over map workers
2. Parallelization of the Global Resolve phase and the transformation of Very Large Models
○ integrating ATL-MR with NeoEMF/HBase
Conclusion
● We align Rule-based Model Transformation with the MapReduce execution model
○ We introduce an execution semantics of ATL on top of MapReduce
○ We experimentally show the good scalability of our solution
● For ATL users: Keep the same syntax and embrace the Cloud
● For MapReduce users: Model Transformation as yet another high-level language for MapReduce