2016/10/8
Asakusa FrameworkScala
ScalaSummit 2016
2
1.
ScalaAsakusaFW
2.ScalaSparkAsakusaFW3.ScalaSparkAsakusaFW4.AsakusaFW5.
3
Twitter ID@hishidama
http://hiroba.dqx.jp/sc/character/1091135261820/
AsakusaFWDQ10
4
2004 Scala2006 Apache HadoopJava62010 2 HadoopHBase2010 SparkOSS2010 Scala2011 3 Asakusa Framework2011 7 Spark2012 2 SIer2012 8 DQ102014 2 Apache Spark2014 3 Java8
5
Hadoop1. 2. l Hadoop
http://techblog.yahoo.co.jp/architecture/hadoop/ 6635 53470
l 8 33
3. Hadoopl Twitter
6
HBase1. HadoopNoSQLHBase
RDBHBase
HBaseHadoop
7
NoSQLNoSQLNot Only SQL
SQLRDBDBDB
CAPCconsistencyAavailabilityPpartition toleranceCANoSQLCPNoSQL CA
8
Scala1. HBaseScala
Better JavaScala
import
2. Scala ScalaSeq
orz
9
Asakusa Framework1. Hadoop
HadoopMapReduce Hive Pig Cascading Huahin FrameworkAsakusaFW AZAREA ClusterAsakusaFW
AsakusaFW
10
Spark1. Hadoop
SparkScala
Mesos
SparkHadoop
11
ScalaSparkAsakusaFW
12
Scala
13
Apache Hadoop1/3lHDFSlMapReducelYARN
1.2.MapReducejar
14
Apache Hadoop2/3
DB
app
app
Hadoop
Hadoop
app
app
15
Apache Hadoop3/3Hadoopl Hadoop
HadooplMapReduce
l
16
Apache SparklRDDScalal
lHDFS
AMPLabDatabrickslhttps://databricks.com/spark/aboutlApache Spark
17
Asakusa FrameworklJavaDSLlHadoopSparkM3BP
l http://www.asakusafw.com/
18
ScalaSparkAsakusaFW
19
Scalaval operator = new MyOperator
val s0 : Stream[Data] =
val s1 = s0.filter(operator.f)
val s2 = s1.map(operator.m)
val out1 = s2.toSeq
20
ScalaMyOperatorclass MyOperator {
def f(data: Data) : Boolean =
data.getValue() % 2 == 0
def m(data: Data) : Data =
Data(data.getValue() + 1)
}
21
Scalaval operator = new MyOperator
val s0 : Stream[Data] =
val s1 = s0.filter(operator.f)
val s2 = s1.map(operator.m)
val out1 = s2.toSeq
DAG
22
DAG1/2l ER
23
DAG2/2Directed Acyclic Graphll
24
Scalaval operator = new MyOperator
val s0 : Stream[Data] =
val s1 = s0.filter(operator.f)
val s2 = s1.map(operator.m)
val out1 = s2.toSeq
s0 filter
f map m
out1
25
Apache Sparkval sc = new SparkContext()
val operator = new MyOperator
val s0 : RDD[Data] = sc.
val s1 = s0.filter(operator.f)
val s2 = s1.map(operator.m)
s2.saveAsTextFile(out1)
MyOperatorScala
s0 filter
f map m
out1
26
Asakusa FrameworkIn s0 = ; //
Out out1 = ; //
MyOperatorFactory operator = new MyOperatorFactory();
Source s1 = operator.f(s0).out;
Source s2 = operator.m(s1).out;
out1.add(s2);
s0 @Branch
f @Update
m out1
27
Asakusa FrameworkMyOperatorFactorypublic abstract class MyOperator {
@Branch
public Filter f(Data data) {
return (data.getValue() % 2 == 0) ? Filter.OUT : Filter.MISSED;
}
@Update
public void m(Data data) {
data.setValue(data.getValue() + 1);
}
}
MyOperatorFactory
28
DAG
DAG
s0 filter
f map m
out1
s0 @Branch
f @Update
m out1
29
1unionjoinzip
s0
out1
s1
30
1 unionJava8 Stream API
Stream out = Stream.concat(Stream.concat(s0, s1), s2);
Scala Spark
val out = s0 ++ s1 ++ s2
AsakusaFW
Source out = core.confluent(s0, s1, s2);
1,abc
2,def
1,foo
3,bar
1,abc
2,def
1,foo
3,bar
31
1 joinJava8 Stream API
Scala
Spark val out = s0.join(s1)
AsakusaFW
Source out = operator.join(s0, s1).joined; // @MasterJoin
1,abc
2,def
1,foo
3,bar
1,abc,foo
32
1 cogroupJava8 Stream API
Scala
Spark val out = s0.cogroup(s1)
AsakusaFW
Source out = operator.group(s0, s1).out; // @CoGroup
1,abc
2,def
1,foo
3,bar
2,def,null 1,abc,foo
3,null,bar
33
1zip zipJava8 Stream API
Scala Spark
val out = s0.zip(s1)
AsakusaFW
1,abc
2,def
1,foo
3,bar
zip 2,def,3,bar 1,abc,1,foo
34
2duplicate
s0
2 out2
1 out1
35
2duplicate duplicateJava8 Stream API
Scala TraversableOnceSpark
Spark
val out1 = s0.map(operator.m1) val out2 = s0.map(operator.m2)
AsakusaFW
Source out1 = operator.m1(s0).out; Source out2 = operator.m2(s0).out;
36
3branch
s0
out2
out1
37
3 branchJava8 Stream API
Scala Spark
filter
AsakusaFW
// @Branch Branch result = operator.branch(s0); Source out1 = result.out1; Source out2 = result.out2; Source out3 = result.out3;
38
AsakusaDAGDAG
@Convert
@CoGroup
@Summarize
@CoGroup
@MJoinUpdate
1 252
@MJoinUpdate
39
Java8 Stream API
ListStream Stream
Scala
.par
Spark Scala Streaming
AsakusaFW HadoopSparkM3BP
Hadoop, Spark
40
AsakusaFW
41
Asakusa Framework1. Hadoop
Hadoop
2.
3. SparkM3BPScalaJava.NETJava
42
M3 for Batch ProcessingM3BP
https://github.com/fixstars/m3bpOS
Spark
CPUGB
43
1
1
2010Hadoophttp://shiumachi.hatenablog.com/entry/20100703/1278133318CPU 816 1632GB 424TB
http://www.atmarkit.co.jp/ait/articles/1608/22/news027.html CPU 20 256GB 36TB
100GB
44
Asakusa FrameworkAsakusaFWjarHadoopMapReduce
SparkM3BP
45
Asakusa on MapReduce
Asakusa on Spark
Asakusa on M3BP
javac javac javac CMake gcc/g++
MapReducejava
SparkASMclass scalac
C++
SEGV
46
AsakusaAsakusa on MapReduce
Asakusa on Spark
Asakusa on M3BP
1.2GB561 60MB69
110 85 8
29kB900 280B1
15 60 3
11GB21700 940MB783
380 700 260
74MB53 81GB1084
3400 2030 400 256270GB
76GB2420 153MB89
670 360 92
47
CPU
HadoopMapR Spark
13 128 750GB
M3BP 1 88 512GB
l M3BPHadoop Spark M3BP
l M3BP 122 881.11.2
48
Asakusa Framework
M3BP
AsakusaFW
49
1/2Hadoop1HDDHDD
50
2/2
SSD2020100TB
CPU100Asakusa on M3BP
RSACPU110TB
MRAM
51
Asakusa DSL1. AsakusaFWAsakusa DSLJava
DSL
2. DSLScala3. Asakusa DSLScala
AsakusaFWSIer
SIerJava Asakusa Scala DSL
Asakusa Scala DSLhttps://atnd.org/events/13174
52
53
Apache Spark
ScalaStreaming
HadoopMapReduceAsakusa Framework
ScalaSIerAsakusa Scala DSL
DQ10DQ10 ver3.4 2016/10/6
Top Related