携程Spark算法平台及其应用 - pic. · PDF file之前在eBay 中国研发中心 ......

download 携程Spark算法平台及其应用 - pic. · PDF file之前在eBay 中国研发中心 ... 关注大数据系统领域的发展,对Hadoop ... ,将scala的代码编译后提交到常驻在YARN集群的Spark

If you can't read please download the document

Transcript of 携程Spark算法平台及其应用 - pic. · PDF file之前在eBay 中国研发中心 ......

  • Spark

    2016-12-10

  • 199930000 2003129 2.5 200120 5000 10020152000

  • 105

    eBay0

    HadoopHIVEHBASESparkStormBusiness

  • BUSparkSpark

  • 4

    Pipeline

  • Sample Case

    DatafromNe5lex1.

    2. / 73

    3.

    4.

  • 1.

  • 2.

  • Transformer1DataFrame1DataFrame ModelTrainer-1DataFrameModel ModelTransformer11DataFrame

    1DataFrame

  • -

  • - ScalaPython

  • -

    DataFrameSQLSparkSQL

  • -

    Pandasmatplotlib

  • Spark

    * spark.mlTransformerEsImator

  • HDFS

  • JarmavenAPI1.

    2. DataFrame

    3. LoadTransformerModel

    4. Transformpredict

  • YARNCluster

    WebServer

    ZeppelinThriWServer

    start

    createSparkContext

    start

    run Spark-ReplSparkIMaininterpret

  • 2

    pipelineZeppelinThriWServer(Lazy): WebServerMarathon(Mesos)Res5ulAPIZepplinThriWServer

    ZepplinThriWServerYARNSpark WebServerZeppelinClientPipelineModule

    ZeppelinNoteParagraphZeppelinThriWServer

    ZeppelinThriWServerSpark-Repl(Spark)scalaYARNSpark

  • - Docker

    ZeppelinThriWServer

    Marathon

    Mesos

    HostMachine

    HM2

    Hive/Spark/HadoopEnv

    Container1 Container2

    WebServer

    API

    API

  • - Zeppelin

    WebServer

    ZeppelinClient

    ZeppelinThriWServer

    RemoteInterpreterServer

    Notebook

    ThriWServer

    SparkInterpreter SparkSQL

    Interpreter

    SparkIM

    ain

    PySparkInterpreter

    CMLStudiopipelinemodulezeppelinNoteparagraph

  • DockerZeppelin Mesos

    Marathon

    ZeppelinSpark

    2

  • Example1

    Example2

    Example3

  • 1. PythonInputDataframe2.SparkZeppelinZeppelinThriWServerNO_OP3. XGBoost

    hcps://github.com/dmlc/xgboost/issues/1276nWorkersHang

    hcps://github.com/dmlc/xgboost/issues/1284

  • -

  • -

  • - 1

    30%

    5

    PC

    APPPC

    5

    PC

    APPPC

  • - 2

    XgBoost

  • CrossValidaIon Spark2.0

    SparkModule

  • WE ARE HIRING

    [email protected]

  • THANKS