[PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

Post on 10-May-2015

1.472 views 3 download

Transcript of [PyCon 2014 APAC] How to integrate python into a scala stack to build realtime predictive models

How to Integrate Python into a Scala Stack to Build

Realtime Predictive Models

Jerry Chou

Lead Research Engineer

jerry@fliptop.com

Stories Beforehand

• Product pivoted • Data search => data analysis• Build on top of existing infrastructure (hosted on AWS & Azure)

• Need tools for scientific computation• Mahout (Java)• Weka (Java)• Scikit-learn (Python)

2

Agenda

• Requirements and high level concepts

• Tools for calling Python from Scala

• Decision making

3

High Level Concept - Before

4

Existing business logic(in both Scala & Java)

Modeling Logic(in Python)

Node 1

Modeling Logic(in Python)

Node 2

…Modeling Logic(in Python)

Node N

Requirements

• APIs to exploit Python’s modeling power• Train, predict, model info query, etc

• Scalability• On demand Python serving nodes

5

Tools for Scala-Python Integration

• Reimplementation of Python• Jython (JPython)

• Communication through JNI• Jepp

• Communication through IPC• Thrift

• Communication through REST API calls• Bottle

6

Jython (JPython)

• Re-Implementation of Python in Java

• Compiles to Java bytecode• either on demand or statically.

• Can import and use any Java class

7

Jython

8

JVM

Scala Code

Python Code

Jython

Jython

• Lacks support for lots of extensions for scientific computing• Numpy, Scipy, etc.

• JyNI to the rescue?• Not ready yet for even Numpy

9

10糟透了 全部重做

Communication through JNI

•Jepp (Java Embedded Python)• Embeds CPython in Java• Runs Python code in CPython• Leverages both JNI and Python/C API for integration

11

Python Interpreter

Jepp

12

JVM

Scala Code

Python Code

JNI Jepp

Jepp

13

object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b)}

object TestJepp extends App { val jep = new Jep() jep.runScript("python_util.py") val a = (2).asInstanceOf[AnyRef] val b = (3).asInstanceOf[AnyRef] val sumByPython = jep.invoke("python_add", a, b)}

def python_add(a, b): return a + bdef python_add(a, b): return a + b

python_util.py

TestJepp.scala

Communication through IPC

• Thrift•Developed & open sourced by Facebook•IDL-based (Interface Definition Language)•Generates server/client code in specified languages•Take care of protocol and transport layer details•Comes with generators for Java, Python, C++, etc.

• No Scala generator• Scrooge to the rescue!

14

Thrift – IDL

15

namespace java python_service_testnamespace py python_service_test

service PythonAddService{ i32 pythonAdd (1:i32 a, 2:i32 b),}

namespace java python_service_testnamespace py python_service_test

service PythonAddService{ i32 pythonAdd (1:i32 a, 2:i32 b),}

TestThrift.thrift

$ thrift --gen java --gen py TestThrift.thrift$ thrift --gen java --gen py TestThrift.thrift

Thrift – Python Server

class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b

handler = ExampleHandler()processor = Example.Processor(handler)transport = TSocket.TServerSocket(9090)tfactory = TTransport.TBufferedTransportFactory()pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve()

class ExampleHandler(python_service_test.PythonAddService.Iface): def pythonAdd(self, a, b): return a + b

handler = ExampleHandler()processor = Example.Processor(handler)transport = TSocket.TServerSocket(9090)tfactory = TTransport.TBufferedTransportFactory()pfactory = TBinaryProtocol.TBinaryProtocolFactory() server = TServer.TThreadedServer(processor, transport, tfactory, pfactory) server.serve()

PythonAddServer.py

class Iface: def pythonAdd(self, a, b): pass

class Iface: def pythonAdd(self, a, b): pass

PythonAddService.py

Thrift – Scala Client

17

object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol)

transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close()}

object PythonAddClient extends App { val transport: TTransport = new TSocket("localhost", 9090) val protocol: TProtocol = new TBinaryProtocol(transport) val client = new PythonAddService.Client(protocol)

transport.open() val sumByPython = client.python_add(3, 5) println("3 + 5 = " + sumByPython) transport.close()}

PythonAddClient.scala

Thrift

18

JVM Scala Code

Thrift

Python Code

Python Interpreter

Thrift

Python Code

Python Interpreter

Thrift

Auto Balancing、Built-in Encryption

19

哦 ~ 還不錯

REST API Architecture

20

…Bottle

Python Code

Bottle

Python Code

Bottle

Python Code

JVM

Scala Code

Auto Balancer?Encoding?

Thrift v.s. REST

Thrift REST

Load Balancer ✔Encode / Decode ✔Low Learning Curve ✔No Dependency ✔

Does it matter?

No (AWS & Azure)

No(We’re already doing it)

Maybe

Yes

Fliptop’s Architecture

22

Load Balancer

…Bottle

Python Code

Bottle

Python Code

Bottle

Python Code

JVM Scala Code

5 Python servers~4,500 requests/sec

Summary

• Jython• (✓) Tight integration with Scala/Java• (✗) Lack support for C extensions (JyNI might help in the future)

• Jepp• (✓) Access high quality Python extensions with CPython speed• (✗) Two runtime environments

• Thrift, REST• (✓) Language-independent development• (✗) Bigger communication overhead

23

Thank You

24

Other tools

• JyNI (Jython Native Interface)• A compatibility layer to enable Jython to use native CPython extensions like

NumPy or SciPy• Binary compatible with existing builds

• Cython• A subset of Python implementation written in Python that translates Python

codes to C

• JNA (Java Native Access)• JNI-based wrapper providing Java programs access to native shared libraries

• JPE (Java-Python Extension)• JNI-based wrapper integrating Java and standard Python• last updated at: 2013-03-22

25