Python于Web 2.0网站的应用 - QCon Beijing 2010

151
Python Web 2.0网站的应用 洪强宁 QCon Beijing 2010 http://www.flickr.com/photos/arnolouise/2986467632/

description

在QCon Beijing 2010上的演讲

Transcript of Python于Web 2.0网站的应用 - QCon Beijing 2010

Python于Web 2.0网站的应用

洪强宁QCon Beijing 2010

http://www.flickr.com/photos/arnolouise/2986467632/

About Me• Python程序员

• 2002年开始接触Python

• 2004年开始完全使用Python工作

• http://www.douban.com/people/hongqn/

[email protected]

• http://twitter.com/hongqn

Python

• Python is a programming language that lets you work more quickly and integrate your systems more effectively. You can learn to use Python and see almost immediate gains in productivity and lower maintenance costs. (via http://python.org/)

Languages in 豆瓣

其他(Pyrex/R/Erlang/Go/Shell)1%

C++3%

Javascript12%

C27%

Python58%

Why Python?

简单易学

简单易学

• Hello World: 1分钟

简单易学

• Hello World: 1分钟

• 小工具脚本: 1下午

简单易学

• Hello World: 1分钟

• 小工具脚本: 1下午

• 实用程序: 1周

简单易学

• Hello World: 1分钟

• 小工具脚本: 1下午

• 实用程序: 1周

• 做个豆瓣: 3个月

开发迅捷

开发迅捷统计各种语言的代码行数: 13行

开发迅捷

import osfrom collections import defaultdict

d = defaultdict(int)

for dirpath, dirnames, filenames in os.walk('.'): for filename in filenames: path = os.path.join(dirpath, filename) ext = os.path.splitext(filename)[1] d[ext] += len(list(open(path)))

for ext, n_lines in d.items(): print ext, n_lines

统计各种语言的代码行数: 13行

易于协作

• 强制缩进保证代码结构清晰易读• Pythonic避免强烈的个人风格

部署方便

• 上线三部曲1. svn ci

2. svn up

3. restart

适用面广

• Web应用

• 离线计算• 运维脚本• 数据分析

资源丰富

• Battery Included: 标准库内置200+模块

• PyPI: 9613 packages currently

• 网络/数据库/桌面/游戏/科学计算/安全/文本处理/...

• easily extensible

更重要的是,老赵也推荐Python

更重要的是,老赵也推荐Python

更重要的是,老赵也推荐Python

Just kidding :-p

示例

Web Server

Web Server

• python -m SimpleHTTPServer

Web Server

• python -m SimpleHTTPServer

web.pyimport web

urls = ( '/(.*)', 'hello')app = web.application(urls, globals())

class hello: def GET(self, name): if not name: name = 'World' return 'Hello, ' + name + '!'

if __name__ == "__main__": app.run()

http://webpy.org/

Flaskimport flask import Flaskapp = Flask(__name__)

@app.route("/<name>")def hello(name): if not name: name = 'World' return 'Hello, ' + name + '!'

if __name__ == "__main__": app.run()

http://flask.pocoo.org/

Why so many Python web frameworks?

• Because you can write your own framework in 3 hours and a total of 60 lines of Python code.

• http://bitworking.org/news/Why_so_many_Python_web_frameworks

doctestdef cube(x): """ >>> cube(10) 1000 """ return x * x

def _test(): import doctest doctest.testmod()

if __name__ == "__main__": _test()

nose http://somethingaboutorange.com/mrl/projects/nose/

from cube import cube

def test_cube(): result = cube(10) assert result == 1000

numpy

>>> from numpy import *>>> A = arange(4).reshape(2, 2)>>> Aarray([[0, 1], [2, 3]])>>> dot(A, A.T)array([[ 1, 3], [ 3, 13]])

http://numpy.scipy.org/

ipython

$ ipython -pylabIn [1]: X = frange(0, 10, 0.1)In [2]: Y = [sin(x) for x in X]In [3]: plot(X, Y)

http://numpy.scipy.org/

ipython

$ ipython -pylabIn [1]: X = frange(0, 10, 0.1)In [2]: Y = [sin(x) for x in X]In [3]: plot(X, Y)

http://numpy.scipy.org/

virtualenv

$ python go-pylons.py --no-site-packages mydevenv$ cd mydevenv$ source bin/activate(mydevenv)$ paster create -t new9 helloworld

http://virtualenv.openplans.org/

创建一个干净的、隔离的python环境

Pyrex/Cython

cdef extern from "math.h" double sin(double)

cdef double f(double x): return sin(x*x)

哲学Pythonic

>>> import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although practicality beats purity.

Errors should never pass silently.

Unless explicitly silenced.

翻译:赖勇浩

http://bit.ly/pyzencn

优美胜于丑陋

明了胜于晦涩

简洁胜于复杂

复杂胜于凌乱

扁平胜于嵌套

间隔胜于紧凑

可读性很重要

即便假借特例的实用性之名,也不可违背这些规则

 

不要包容所有错误,除非你确定需要这样做

 

In the face of ambiguity, refuse the temptation to guess.

There should be one-- and preferably only one --obvious way to do it.

Although that way may not be obvious at first unless you're Dutch.

Now is better than never.

Although never is often better than *right* now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let's do more of those!

当存在多种可能,不要尝试去猜测

而是尽量找一种,最好是唯一一种明显的解决方案

虽然这并不容易,因为你不是 Python 之父

 

做也许好过不做,但不假思索就动手还不如不做

 

如果你无法向人描述你的方案,那肯定不是一个好方案;反之亦然

 

命名空间是一种绝妙的理念,我们应当多加利用

Simple is better than complex

class HelloWorld{ public static void main(String args[]) { System.out.println("Hello World!"); }}

Simple is better than complex

print "Hello World!"

Readability counts

Readability counts

• 强制块缩进,没有{}和end

Readability counts

• 强制块缩进,没有{}和end

• 没有费解的字符 (except "@" for decorators)

Readability counts

• 强制块缩进,没有{}和end

• 没有费解的字符 (except "@" for decorators)

if limit is not None and len(ids)>limit: ids = random.sample(ids, limit)

TOOWTDI

• There (should be) Only One Way To Do It.

• vs. Perlish TIMTOWTDI (There Is More Than One Way To Do It)

TOOWTDI

• There (should be) Only One Way To Do It.

• vs. Perlish TIMTOWTDI (There Is More Than One Way To Do It)

a = [1, 2, 3, 4, 5]b = []for i in range(len(a)): b.append(a[i]*2)

TOOWTDI

• There (should be) Only One Way To Do It.

• vs. Perlish TIMTOWTDI (There Is More Than One Way To Do It)

a = [1, 2, 3, 4, 5]b = []for i in range(len(a)): b.append(a[i]*2)

TOOWTDI

• There (should be) Only One Way To Do It.

• vs. Perlish TIMTOWTDI (There Is More Than One Way To Do It)

a = [1, 2, 3, 4, 5]b = []for i in range(len(a)): b.append(a[i]*2)

b = []for x in a: b.append(x*2)

TOOWTDI

• There (should be) Only One Way To Do It.

• vs. Perlish TIMTOWTDI (There Is More Than One Way To Do It)

a = [1, 2, 3, 4, 5]b = []for i in range(len(a)): b.append(a[i]*2)

b = []for x in a: b.append(x*2)

TOOWTDI

• There (should be) Only One Way To Do It.

• vs. Perlish TIMTOWTDI (There Is More Than One Way To Do It)

b = [x*2 for x in a]

有图有真相

Python C

http://www.flickr.com/photos/nicksieger/281055485/ http://www.flickr.com/photos/nicksieger/281055530/

看图不说话

Ruby

http://www.flickr.com/photos/nicksieger/280661836/

看图不说话

Java

http://www.flickr.com/photos/nicksieger/280662707/

利用Python的语言特性简化开发

案例零

案例零

• svn中保持缺省配置,开发者环境和线上环境按需特例配置

案例零

• svn中保持缺省配置,开发者环境和线上环境按需特例配置

• 配置中需要复合结构数据(如list)

案例零

• svn中保持缺省配置,开发者环境和线上环境按需特例配置

• 配置中需要复合结构数据(如list)

• 多个配置文件 + 部署时自动合并?

案例零

• svn中保持缺省配置,开发者环境和线上环境按需特例配置

• 配置中需要复合结构数据(如list)

• 多个配置文件 + 部署时自动合并?

• 编写配置文件格式parser?

MEMCACHED_ADDR = ['localhost:11211']

from local_config import *

config.py

MEMCACHED_ADDR = ['localhost:11211']

from local_config import *

config.py

MEMCACHED_ADDR = [ 'frodo:11211', 'sam:11211', 'pippin:11211', 'merry:11211',]

local_config.py

MEMCACHED_ADDR = ['localhost:11211']

from local_config import *

config.py

MEMCACHED_ADDR = [ 'frodo:11211', 'sam:11211', 'pippin:11211', 'merry:11211',]

local_config.py文件名后缀不为.py时,也可使用exec

案例一

• 某些页面必须拥有某个权限才能访问

class GroupUI(object): def new_topic(self, request): if self.group.can_post(request.user): return new_topic_ui(self.group) else: request.response.set_status(403, "Forbidden") return error_403_ui(msg="成为小组成员才能发帖")

def join(self, request): if self.group.can_join(request.user): ...

class Group(object): def can_post(self, user): return self.group.has_member(user)

def can_join(self, user): return not self.group.has_banned(user)

class GroupUI(object): @check_permission('post', msg="成为小组成员才能发帖") def new_topic(self, request): return new_topic_ui(self.group)

@check_permission('join', msg="不能加入小组") def join(self, request): ...

class Group(object): def can_post(self, user): return self.group.has_member(user)

def can_join(self, user): return not self.group.has_banned(user)

decorator

def print_before_exec(func): def _(*args, **kwargs): print "decorated" return func(*args, **kwargs) return _

@print_before_execdef double(x): print x*2

double(10)

decorator

def print_before_exec(func): def _(*args, **kwargs): print "decorated" return func(*args, **kwargs) return _

@print_before_execdef double(x): print x*2

double(10)

输出:

decorated20

class check_permission(object): def __init__(self, action, msg=None): self.action = action self.msg = msg

def __call__(self, func): def _(ui, req, *args, **kwargs): f = getattr(ui.perm_obj, 'can_' + self.action) if f(req.user): return func(ui, *args, **kwargs) raise BadPermission(ui.perm_obj, self.action, self.msg) return _

class check_permission(object): def __init__(self, action, msg=None): self.action = action self.msg = msg

def __call__(self, func): def _(ui, req, *args, **kwargs): f = getattr(ui.perm_obj, 'can_' + self.action) if f(req.user): return func(ui, *args, **kwargs) raise BadPermission(ui.perm_obj, self.action, self.msg) return _

class check_permission(object): def __init__(self, action, msg=None): self.action = action self.msg = msg

def __call__(self, func): def _(ui, req, *args, **kwargs): f = getattr(ui.perm_obj, 'can_' + self.action) if f(req.user): return func(ui, *args, **kwargs) raise BadPermission(ui.perm_obj, self.action, self.msg) return _

class check_permission(object): def __init__(self, action, msg=None): self.action = action self.msg = msg

def __call__(self, func): def _(ui, req, *args, **kwargs): f = getattr(ui.perm_obj, 'can_' + self.action) if f(req.user): return func(ui, *args, **kwargs) raise BadPermission(ui.perm_obj, self.action, self.msg) return _

class check_permission(object): def __init__(self, action, msg=None): self.action = action self.msg = msg

def __call__(self, func): def _(ui, req, *args, **kwargs): f = getattr(ui.perm_obj, 'can_' + self.action) if f(req.user): return func(ui, *args, **kwargs) raise BadPermission(ui.perm_obj, self.action, self.msg) return _

class check_permission(object): def __init__(self, action, msg=None): self.action = action self.msg = msg

def __call__(self, func): def _(ui, req, *args, **kwargs): f = getattr(ui.perm_obj, 'can_' + self.action) if f(req.user): return func(ui, *args, **kwargs) raise BadPermission(ui.perm_obj, self.action, self.msg) return _

class GroupUI(object): @check_permission('post', msg="成为小组成员才能发帖") def new_topic(self, request): return new_topic_ui(self.group)

@check_permission('join', msg="不能加入小组") def join(self, request): ...

class Group(object): def can_post(self, user): return self.group.has_member(user)

def can_join(self, user): return not self.group.has_banned(user)

案例二

• 使用消息队列异步调用函数

def send_notification_mail(email, subject, body): msg = MSG_SEND_MAIL + '\0' + email + '\0' + subject + '\0' + body mq.put(msg)

def async_worker(): msg = mq.get() msg = msg.split('\0') cmd = msg[0] if cmd == MSG_SEND_MAIL: email, subject, body = msg[1:] fromaddr = '[email protected]' email_body = make_email_body(fromaddr, email, subject, body) smtp = smtplib.SMTP('mail') smtp.sendmail(fromaddr, email, email_body) elif cmd == MSG_xxxx: ... elif cmd == MSG_yyyy: ...

@asyncdef send_notification_mail(email, subject, body): fromaddr = '[email protected]' email_body = make_email_body(fromaddr, email, subject, body) smtp = smtplib.SMTP('mail') smtp.sendmail(fromaddr, email, email_body)

def async(func): mod = sys.modules[func.__module__] fname = 'origin_' + func.__name__ mod.__dict__[fname] = func def _(*a, **kw): body = cPickle.dumps((mod.__name__, fname, a, kw)) mq.put(body) return _

def async_worker(): modname, fname, a, kw = cPickle.loads(mq.get()) __import__(modname) mod = sys.modules[modname] mod.__dict__[fname](*a, **kw)

def async(func): mod = sys.modules[func.__module__] fname = 'origin_' + func.__name__ mod.__dict__[fname] = func def _(*a, **kw): body = cPickle.dumps((mod.__name__, fname, a, kw)) mq.put(body) return _

def async_worker(): modname, fname, a, kw = cPickle.loads(mq.get()) __import__(modname) mod = sys.modules[modname] mod.__dict__[fname](*a, **kw)

def async(func): mod = sys.modules[func.__module__] fname = 'origin_' + func.__name__ mod.__dict__[fname] = func def _(*a, **kw): body = cPickle.dumps((mod.__name__, fname, a, kw)) mq.put(body) return _

def async_worker(): modname, fname, a, kw = cPickle.loads(mq.get()) __import__(modname) mod = sys.modules[modname] mod.__dict__[fname](*a, **kw)

def async(func): mod = sys.modules[func.__module__] fname = 'origin_' + func.__name__ mod.__dict__[fname] = func def _(*a, **kw): body = cPickle.dumps((mod.__name__, fname, a, kw)) mq.put(body) return _

def async_worker(): modname, fname, a, kw = cPickle.loads(mq.get()) __import__(modname) mod = sys.modules[modname] mod.__dict__[fname](*a, **kw)

def async(func): mod = sys.modules[func.__module__] fname = 'origin_' + func.__name__ mod.__dict__[fname] = func def _(*a, **kw): body = cPickle.dumps((mod.__name__, fname, a, kw)) mq.put(body) return _

def async_worker(): modname, fname, a, kw = cPickle.loads(mq.get()) __import__(modname) mod = sys.modules[modname] mod.__dict__[fname](*a, **kw)

def async(func): mod = sys.modules[func.__module__] fname = 'origin_' + func.__name__ mod.__dict__[fname] = func def _(*a, **kw): body = cPickle.dumps((mod.__name__, fname, a, kw)) mq.put(body) return _

def async_worker(): modname, fname, a, kw = cPickle.loads(mq.get()) __import__(modname) mod = sys.modules[modname] mod.__dict__[fname](*a, **kw)

def async(func): mod = sys.modules[func.__module__] fname = 'origin_' + func.__name__ mod.__dict__[fname] = func def _(*a, **kw): body = cPickle.dumps((mod.__name__, fname, a, kw)) mq.put(body) return _

def async_worker(): modname, fname, a, kw = cPickle.loads(mq.get()) __import__(modname) mod = sys.modules[modname] mod.__dict__[fname](*a, **kw)

def async(func): mod = sys.modules[func.__module__] fname = 'origin_' + func.__name__ mod.__dict__[fname] = func def _(*a, **kw): body = cPickle.dumps((mod.__name__, fname, a, kw)) mq.put(body) return _

def async_worker(): modname, fname, a, kw = cPickle.loads(mq.get()) __import__(modname) mod = sys.modules[modname] mod.__dict__[fname](*a, **kw)

案例三

• cache函数运行结果(SQL, 复杂运算, etc)

def get_latest_review_id(): review_id = mc.get('latest_review_id') if review_id is None: review_id = exc_sql("select max(id) from review") mc.set('latest_review_id', review_id) return review_id

@cache('latest_review_id')def get_latest_review_id(): return exc_sql("select max(id) from review")

def cache(key): def deco(func): def _(*args, **kwargs): r = mc.get(key) if r is None: r = func(*args, **kwargs) mc.set(key, r) return r return _ return deco

def cache(key): def deco(func): def _(*args, **kwargs): r = mc.get(key) if r is None: r = func(*args, **kwargs) mc.set(key, r) return r return _ return deco

def get_review(id): key = 'review:%s' % id review = mc.get(key) if review is None: # cache miss id, author_id, text = exc_sql("select id, author_id, text from review where id=%s", id) review = Review(id, author_id, text) mc.set(key, review) return review

如果cache key需要动态生成呢?

需要动态生成的cache key该如何写decorator?

@cache('review:{id}')def get_review(id): id, author_id, text = exc_sql("select id, author_id, text from review where id=%s", id) return Review(id, author_id, text)

def cache(key_pattern, expire=0): def deco(f): arg_names, varargs, varkw, defaults = inspect.getargspec(f) if varargs or varkw: raise Exception("not support varargs") gen_key = gen_key_factory(key_pattern, arg_names, defaults)

def _(*a, **kw): key = gen_key(*a, **kw) r = mc.get(key) if r is None: r = f(*a, **kw) mc.set(key, r, expire) return r return _ return deco

def cache(key_pattern, expire=0): def deco(f): arg_names, varargs, varkw, defaults = inspect.getargspec(f) if varargs or varkw: raise Exception("not support varargs") gen_key = gen_key_factory(key_pattern, arg_names, defaults)

def _(*a, **kw): key = gen_key(*a, **kw) r = mc.get(key) if r is None: r = f(*a, **kw) mc.set(key, r, expire) return r return _ return deco

def cache(key_pattern, expire=0): def deco(f): arg_names, varargs, varkw, defaults = inspect.getargspec(f) if varargs or varkw: raise Exception("not support varargs") gen_key = gen_key_factory(key_pattern, arg_names, defaults)

def _(*a, **kw): key = gen_key(*a, **kw) r = mc.get(key) if r is None: r = f(*a, **kw) mc.set(key, r, expire) return r return _ return deco

inspect.getargspec>>> import inspect>>> def f(a, b=1, c=2):... pass... >>> inspect.getargspec(f)ArgSpec(args=['a', 'b', 'c'], varargs=None, keywords=None, defaults=(1, 2))>>>>>>>>> def f(a, b=1, c=2, *args, **kwargs):... pass... >>> inspect.getargspec(f)ArgSpec(args=['a', 'b', 'c'], varargs='args', keywords='kwargs', defaults=(1, 2))

def cache(key_pattern, expire=0): def deco(f): arg_names, varargs, varkw, defaults = inspect.getargspec(f) if varargs or varkw: raise Exception("not support varargs") gen_key = gen_key_factory(key_pattern, arg_names, defaults)

def _(*a, **kw): key = gen_key(*a, **kw) r = mc.get(key) if r is None: r = f(*a, **kw) mc.set(key, r, expire) return r return _ return deco

def cache(key_pattern, expire=0): def deco(f): arg_names, varargs, varkw, defaults = inspect.getargspec(f) if varargs or varkw: raise Exception("not support varargs") gen_key = gen_key_factory(key_pattern, arg_names, defaults)

def _(*a, **kw): key = gen_key(*a, **kw) r = mc.get(key) if r is None: r = f(*a, **kw) mc.set(key, r, expire) return r return _ return deco

hint:• str.format in python 2.6: '{id}'.format(id=1) => '1'• dict(zip(['a', 'b', 'c'], [1, 2, 3])) => {'a': 1, 'b': 2, 'c': 3}

def cache(key_pattern, expire=0): def deco(f): arg_names, varargs, varkw, defaults = inspect.getargspec(f) if varargs or varkw: raise Exception("not support varargs") gen_key = gen_key_factory(key_pattern, arg_names, defaults)

def _(*a, **kw): key = gen_key(*a, **kw) r = mc.get(key) if r is None: r = f(*a, **kw) mc.set(key, r, expire) return r return _ return deco

hint:• str.format in python 2.6: '{id}'.format(id=1) => '1'• dict(zip(['a', 'b', 'c'], [1, 2, 3])) => {'a': 1, 'b': 2, 'c': 3}

def cache(key_pattern, expire=0): def deco(f): arg_names, varargs, varkw, defaults = inspect.getargspec(f) if varargs or varkw: raise Exception("not support varargs") gen_key = gen_key_factory(key_pattern, arg_names, defaults)

def _(*a, **kw): key = gen_key(*a, **kw) r = mc.get(key) if r is None: r = f(*a, **kw) mc.set(key, r, expire) return r return _ return deco

案例四

• feed阅读器同时显示多个feed的文章,按entry_id合并排序。

class Feed(object): def get_entries(self, limit=10): ids = exc_sqls("select id from entry where feed_id=%s order by id desc limit %s", (self.id, limit)) return [Entry.get(id) for id in ids]

class FeedCollection(object): def get_entries(self, limit=10): mixed_entries = [] for feed in self.feeds: entries = feed.get_entries(limit=limit) mixed_entries += entries mixed_entries.sort(key=lambda e: e.id, reverse=True) return mixed_entries[:10]

class Feed(object): def get_entries(self, limit=10): ids = exc_sqls("select id from entry where feed_id=%s order by id desc limit %s", (self.id, limit)) return [Entry.get(id) for id in ids]

class FeedCollection(object): def get_entries(self, limit=10): mixed_entries = [] for feed in self.feeds: entries = feed.get_entries(limit=limit) mixed_entries += entries mixed_entries.sort(key=lambda e: e.id, reverse=True) return mixed_entries[:10]

class Feed(object): def get_entries(self, limit=10): ids = exc_sqls("select id from entry where feed_id=%s order by id desc limit %s", (self.id, limit)) return [Entry.get(id) for id in ids]

class FeedCollection(object): def get_entries(self, limit=10): mixed_entries = [] for feed in self.feeds: entries = feed.get_entries(limit=limit) mixed_entries += entries mixed_entries.sort(key=lambda e: e.id, reverse=True) return mixed_entries[:10]

class Feed(object): def get_entries(self, limit=10): ids = exc_sqls("select id from entry where feed_id=%s order by id desc limit %s", (self.id, limit)) return [Entry.get(id) for id in ids]

class FeedCollection(object): def get_entries(self, limit=10): mixed_entries = [] for feed in self.feeds: entries = feed.get_entries(limit=limit) mixed_entries += entries mixed_entries.sort(key=lambda e: e.id, reverse=True) return mixed_entries[:10]

数据库查询行数 = len(self.feeds) * limit

class Feed(object): def get_entries(self, limit=10): ids = exc_sqls("select id from entry where feed_id=%s order by id desc limit %s", (self.id, limit)) return [Entry.get(id) for id in ids]

class FeedCollection(object): def get_entries(self, limit=10): mixed_entries = [] for feed in self.feeds: entries = feed.get_entries(limit=limit) mixed_entries += entries mixed_entries.sort(key=lambda e: e.id, reverse=True) return mixed_entries[:10]

浪费的Entry.get数 = len(self.feeds-1) * limit

iterator and generatordef fib(): x, y = 1, 1 while True: yield x x, y = y, x+y

def odd(seq): return (n for n in seq if n%2)

def less_than(seq, upper_limit): for number in seq: if number >= upper_limit: break yield number

print sum(odd(less_than(fib(), 4000000)))

itertools• count([n]) --> n, n+1, n+2

• cycle(p) --> p0, p1, ... plast, p0, p1, ...

• repeat(elem [,n]) --> elem, elem, elem, ... endless or up to n times

• izip(p, q, ...) --> (p[0], q[0]), (p[1], q[1]), ...

• islice(seq, [start,] stop [, step]) --> elements from seq[start:stop:step]

• ... and more ...

class Feed(object): def iter_entries(self): start_id = sys.maxint while True: entry_ids = exc_sqls("select id from entry where feed_id=%s and id<%s order by id desc limit 5", (self.id, start_id)) if not entry_ids: break for entry_id in entry_ids: yield Entry.get(entry_id) start_id = entry_ids[-1]

class FeedCollection(object): def iter_entries(self): return imerge(*[feed.iter_entries() for feed in self.feeds])

def get_entries(self, limit=10): return list(islice(self.iter_entries(), limit))

class Feed(object): def iter_entries(self): start_id = sys.maxint while True: entry_ids = exc_sqls("select id from entry where feed_id=%s and id<%s order by id desc limit 5", (self.id, start_id)) if not entry_ids: break for entry_id in entry_ids: yield Entry.get(entry_id) start_id = entry_ids[-1]

class FeedCollection(object): def iter_entries(self): return imerge(*[feed.iter_entries() for feed in self.feeds])

def get_entries(self, limit=10): return list(islice(self.iter_entries(), limit))

class Feed(object): def iter_entries(self): start_id = sys.maxint while True: entry_ids = exc_sqls("select id from entry where feed_id=%s and id<%s order by id desc limit 5", (self.id, start_id)) if not entry_ids: break for entry_id in entry_ids: yield Entry.get(entry_id) start_id = entry_ids[-1]

class FeedCollection(object): def iter_entries(self): return imerge(*[feed.iter_entries() for feed in self.feeds])

def get_entries(self, limit=10): return list(islice(self.iter_entries(), limit))

class Feed(object): def iter_entries(self): start_id = sys.maxint while True: entry_ids = exc_sqls("select id from entry where feed_id=%s and id<%s order by id desc limit 5", (self.id, start_id)) if not entry_ids: break for entry_id in entry_ids: yield Entry.get(entry_id) start_id = entry_ids[-1]

class FeedCollection(object): def iter_entries(self): return imerge(*[feed.iter_entries() for feed in self.feeds])

def get_entries(self, limit=10): return list(islice(self.iter_entries(), limit))

class Feed(object): def iter_entries(self): start_id = sys.maxint while True: entry_ids = exc_sqls("select id from entry where feed_id=%s and id<%s order by id desc limit 5", (self.id, start_id)) if not entry_ids: break for entry_id in entry_ids: yield Entry.get(entry_id) start_id = entry_ids[-1]

class FeedCollection(object): def iter_entries(self): return imerge(*[feed.iter_entries() for feed in self.feeds])

def get_entries(self, limit=10): return list(islice(self.iter_entries(), limit))

class Feed(object): def iter_entries(self): start_id = sys.maxint while True: entry_ids = exc_sqls("select id from entry where feed_id=%s and id<%s order by id desc limit 5", (self.id, start_id)) if not entry_ids: break for entry_id in entry_ids: yield Entry.get(entry_id) start_id = entry_ids[-1]

class FeedCollection(object): def iter_entries(self): return imerge(*[feed.iter_entries() for feed in self.feeds])

def get_entries(self, limit=10): return list(islice(self.iter_entries(), limit))

数据库查询行数 = len(self.feeds) * 5 ~

len(self.feeds)*5 + limit -5

class Feed(object): def iter_entries(self): start_id = sys.maxint while True: entry_ids = exc_sqls("select id from entry where feed_id=%s and id<%s order by id desc limit 5", (self.id, start_id)) if not entry_ids: break for entry_id in entry_ids: yield Entry.get(entry_id) start_id = entry_ids[-1]

class FeedCollection(object): def iter_entries(self): return imerge(*[feed.iter_entries() for feed in self.feeds])

def get_entries(self, limit=10): return list(islice(self.iter_entries(), limit))

浪费的Entry.get数 =0 ~ len(self.feeds)-1

decorator 和 generator 是简化代码的利器

案例五

• 优化不可变对象反序列化时间

class User(object): def __init__(self, id, username, screen_name, sig): self.id = id self.username = username self.screen_name = screen_name self.sig = sig

user = User('1002211', 'hongqn', 'hongqn', "巴巴布、巴巴布巴布巴布!")

$ python -m timeit -s '> from user import user> from cPickle import dumps, loads> s = dumps(user, 2)' \> 'loads(s)'100000 loops, best of 3: 6.6 usec per loop

$ python -m timeit -s '> from user import user> from marshal import dumps, loads> d = (user.id, user.username, user.screen_name, user.sig)> s = dumps(d, 2)' 'loads(s)'1000000 loops, best of 3: 0.9 usec per loop

cPickle vs. marshal

$ python -m timeit -s '> from user import user> from cPickle import dumps, loads> s = dumps(user, 2)' \> 'loads(s)'100000 loops, best of 3: 6.6 usec per loop

$ python -m timeit -s '> from user import user> from marshal import dumps, loads> d = (user.id, user.username, user.screen_name, user.sig)> s = dumps(d, 2)' 'loads(s)'1000000 loops, best of 3: 0.9 usec per loop

cPickle vs. marshal

7倍速度提升

$ python -m timeit -s '> from user import user> from cPickle import dumps, loads> s = dumps(user, 2)' \> 'loads(s)'100000 loops, best of 3: 6.6 usec per loop

$ python -m timeit -s '> from user import user> from marshal import dumps, loads> d = (user.id, user.username, user.screen_name, user.sig)> s = dumps(d, 2)' 'loads(s)'1000000 loops, best of 3: 0.9 usec per loop

cPickle vs. marshal

7倍速度提升

$ python -c '> import cPickle, marshal> from user import user> print "pickle:", len(cPickle.dumps(user, 2))> print "marshal:", len(marshal.dumps((user.id, \> user.username, user.screen_name, user.sig), 2))'pickle: 129marshal: 74

cPickle vs. marshaltimeit

43%空间节省

$ python -c '> import cPickle, marshal> from user import user> print "pickle:", len(cPickle.dumps(user, 2))> print "marshal:", len(marshal.dumps((user.id, \> user.username, user.screen_name, user.sig), 2))'pickle: 129marshal: 74

cPickle vs. marshaltimeit

43%空间节省

namedtuple

from collections import namedtuple

User = namedtuple('User', 'id username screen_name sig')

user = User('1002211', 'hongqn', 'hongqn', sig="巴巴布、巴巴布巴布巴布!")

user.username-> 'hongqn'

__metaclass__

class User(tuple): __metaclass__ = NamedTupleMetaClass __attrs__ = ['id', 'username', 'screen_name', 'sig']

user = User('1002211', 'hongqn', 'hongqn', sig="巴巴布、巴巴布巴布巴布!")

s = marshal.dumps(user.__marshal__())User.__load_marshal__(marshal.loads(s))

from operator import itemgetter

class NamedTupleMetaClass(type): def __new__(mcs, name, bases, dict): assert bases == (tuple,) for i, a in enumerate(dict['__attrs__']): dict[a] = property(itemgetter(i)) dict['__slots__'] = () dict['__marshal__'] = tuple dict['__load_marshal__'] = classmethod(tuple.__new__) dict['__getnewargs__'] = lambda self: tuple(self) argtxt = repr(tuple(attrs)).replace("'", "")[1:-1] template = """def newfunc(cls, %(argtxt)s): return tuple.__new__(cls, (%(argtxt)s))""" % locals() namespace = {} exec template in namespace dict['__new__'] = namespace['newfunc'] return type.__new__(mcs, name, bases, dict)

Warning!

案例六

• 简化request.get_environ(key)的写法

• e.g. request.get_environ('REMOTE_ADDR') --> request.remote_addr

descriptor

• 一个具有__get__, __set__或者__delete__方法的对象

class Descriptor(object): def __get__(self, instance, owner): return 'descriptor'

class Owner(object): attr = Descriptor()

owner = Owner()owner.attr --> 'descriptor'

常用的descriptor

• classmethod

• staticmethod

• property

class C(object): def get_x(self): return self._x def set_x(self, x): self._x = x x = property(get_x, set_x)

class environ_getter(object): def __init__(self, key, default=None): self.key = key self.default = default

def __get__(self, obj, objtype): if obj is None: return self return obj.get_environ(self.key, self.default)

class HTTPRequest(quixote.http_request.HTTPRequest): for key in ['HTTP_REFERER', 'REMOTE_ADDR', 'SERVER_NAME', 'REQUEST_URI', 'HTTP_HOST']: locals()[key.lower()] = environ_getter(key) del key

locals()

class environ_getter(object): def __init__(self, key, default=None): self.key = key self.default = default

def __get__(self, obj, objtype): if obj is None: return self return obj.get_environ(self.key, self.default)

class HTTPRequest(quixote.http_request.HTTPRequest): for key in ['HTTP_REFERER', 'REMOTE_ADDR', 'SERVER_NAME', 'REQUEST_URI', 'HTTP_HOST']: locals()[key.lower()] = environ_getter(key) del key

案例七

• 让 urllib.urlopen 自动利用socks代理翻墙

Monkey Patch

import httplib

orig_connect = httplib.HTTPConnection.connect

def _patched_connect(self): if HOSTS_BLOCKED.match(self.host): return _connect_via_socks_proxy(self) else: return orig_connect(self)

def _connect_via_socks_proxy(self): ...

httplib.HTTPConnection.connect = _patched_connect

使用Python时需要注意的问题

使用Python时需要注意的问题

• Pythonic!

使用Python时需要注意的问题

• Pythonic!

• Avoid gotchas http://www.ferg.org/projects/python_gotchas.html

使用Python时需要注意的问题

• Pythonic!

• Avoid gotchas http://www.ferg.org/projects/python_gotchas.html

• Unicode / Character Encoding

使用Python时需要注意的问题

• Pythonic!

• Avoid gotchas http://www.ferg.org/projects/python_gotchas.html

• Unicode / Character Encoding

• GIL (Global Interpreter Lock)

使用Python时需要注意的问题

• Pythonic!

• Avoid gotchas http://www.ferg.org/projects/python_gotchas.html

• Unicode / Character Encoding

• GIL (Global Interpreter Lock)

• Garbage Collection

开发环境

• 编辑器: Vim / Emacs / Ulipad

• 版本管理: subversion / mercurial / git

• wiki/错误跟踪/代码浏览: Trac

• 持续集成: Bitten

Python Implementations

Python Implementations

• CPython http://www.python.org/

Python Implementations

• CPython http://www.python.org/

• Unlanden-Swallow http://code.google.com/p/unladen-swallow/

Python Implementations

• CPython http://www.python.org/

• Unlanden-Swallow http://code.google.com/p/unladen-swallow/

• Stackless Python http://www.stackless.com/

Python Implementations

• CPython http://www.python.org/

• Unlanden-Swallow http://code.google.com/p/unladen-swallow/

• Stackless Python http://www.stackless.com/

• IronPython http://ironpython.net/

Python Implementations

• CPython http://www.python.org/

• Unlanden-Swallow http://code.google.com/p/unladen-swallow/

• Stackless Python http://www.stackless.com/

• IronPython http://ironpython.net/

• Jython http://www.jython.org/

Python Implementations

• CPython http://www.python.org/

• Unlanden-Swallow http://code.google.com/p/unladen-swallow/

• Stackless Python http://www.stackless.com/

• IronPython http://ironpython.net/

• Jython http://www.jython.org/

• PyPy http://pypy.org/

感谢国家,感谢大家Q & A