携程Spark算法平台及其应用 - pic. · PDF file之前在eBay 中国研发中心 ......
Transcript of 携程Spark算法平台及其应用 - pic. · PDF file之前在eBay 中国研发中心 ......
Spark
2016-12-10
199930000 2003129 2.5 200120 5000 10020152000
105
eBay0
HadoopHIVEHBASESparkStormBusiness
BUSparkSpark
4
Pipeline
Sample Case
DatafromNe5lex1.
2. / 73
3.
4.
1.
2.
Transformer1DataFrame1DataFrame ModelTrainer-1DataFrameModel ModelTransformer11DataFrame
1DataFrame
-
- ScalaPython
-
DataFrameSQLSparkSQL
-
Pandasmatplotlib
Spark
* spark.mlTransformerEsImator
HDFS
JarmavenAPI1.
2. DataFrame
3. LoadTransformerModel
4. Transformpredict
YARNCluster
WebServer
ZeppelinThriWServer
start
createSparkContext
start
run Spark-ReplSparkIMaininterpret
2
pipelineZeppelinThriWServer(Lazy): WebServerMarathon(Mesos)Res5ulAPIZepplinThriWServer
ZepplinThriWServerYARNSpark WebServerZeppelinClientPipelineModule
ZeppelinNoteParagraphZeppelinThriWServer
ZeppelinThriWServerSpark-Repl(Spark)scalaYARNSpark
- Docker
ZeppelinThriWServer
Marathon
Mesos
HostMachine
HM2
Hive/Spark/HadoopEnv
Container1 Container2
WebServer
API
API
- Zeppelin
WebServer
ZeppelinClient
ZeppelinThriWServer
RemoteInterpreterServer
Notebook
ThriWServer
SparkInterpreter SparkSQL
Interpreter
SparkIM
ain
PySparkInterpreter
CMLStudiopipelinemodulezeppelinNoteparagraph
DockerZeppelin Mesos
Marathon
ZeppelinSpark
2
Example1
Example2
Example3
1. PythonInputDataframe2.SparkZeppelinZeppelinThriWServerNO_OP3. XGBoost
hcps://github.com/dmlc/xgboost/issues/1276nWorkersHang
hcps://github.com/dmlc/xgboost/issues/1284
-
-
- 1
30%
5
PC
APPPC
5
PC
APPPC
- 2
XgBoost
CrossValidaIon Spark2.0
SparkModule
WE ARE HIRING
THANKS