PyCon 2015 Crawler Tutorial Explain Encoding

37
Encoding b'\xe6\x96\x87\xe5\xad\x97\xe7\xb7\xa8\xe7\xa2\xbc'

Transcript of PyCon 2015 Crawler Tutorial Explain Encoding

Page 1: PyCon 2015 Crawler Tutorial Explain Encoding

Encodingb'\xe6\x96\x87\xe5\xad\x97\xe7\xb7\xa8\xe7\xa2\xbc'

Page 2: PyCon 2015 Crawler Tutorial Explain Encoding

先來回想⼀一下...

Page 3: PyCon 2015 Crawler Tutorial Explain Encoding

先來回想⼀一下...

當學⽣生的時候⼀一定想過要怎麼跟隔壁同學作弊...

Page 4: PyCon 2015 Crawler Tutorial Explain Encoding

Morse Code

A: ・__

B: __・・・

C: __・__・

D: __・・

Page 5: PyCon 2015 Crawler Tutorial Explain Encoding

Morse Code

A: ・__

B: __・・・

C: __・__・

D: __・・

嗶嗶—

嗶—嗶嗶嗶

嗶—嗶嗶—嗶

嗶—嗶嗶

Page 6: PyCon 2015 Crawler Tutorial Explain Encoding

Morse Code

A: ・__

B: __・・・

C: __・__・

D: __・・

左右

右左左左

右左右左

右左左

Page 7: PyCon 2015 Crawler Tutorial Explain Encoding

Encode / Decode

Page 8: PyCon 2015 Crawler Tutorial Explain Encoding

Encode / Decode

Page 9: PyCon 2015 Crawler Tutorial Explain Encoding

Encode / Decode

Encode

左右

Page 10: PyCon 2015 Crawler Tutorial Explain Encoding

Encode / Decode

Encode

左右 左右

Transport

Page 11: PyCon 2015 Crawler Tutorial Explain Encoding

Encode / Decode

Encode

左右 左右

DecodeTransport

Page 12: PyCon 2015 Crawler Tutorial Explain Encoding

Encode / Decode

50 59 43 4F 4EPYCONText Bytes (8 bits)Encode

Writing

ReadingBytes

50 59 43 4F 4E

Decode Text

PYCON

Page 13: PyCon 2015 Crawler Tutorial Explain Encoding

Encode / Decode in the web

Page 14: PyCon 2015 Crawler Tutorial Explain Encoding

Encode / Decode in the web

HTML Documents

Encode

Bytes

Server

Page 15: PyCon 2015 Crawler Tutorial Explain Encoding

Encode / Decode in the web

HTML Documents

Encode

Bytes

Transport

Server Internet

Page 16: PyCon 2015 Crawler Tutorial Explain Encoding

Encode / Decode in the web

HTML Documents

Encode

Bytes

Transport

Server Internet

DecodeClient

Page 17: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

先回到剛剛的例⼦子,⽤用摩斯電碼作弊

Page 18: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

如果我們把Morse Code ABCD 換成 1234 呢

Page 19: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

A: ・— — — —

B: ・・— — —

C: ・・・— —

D: ・・・・—

左右右右右

左左右右右

左左左右右

左左左左右

Page 20: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

我們把第⼀一種叫做Morse ABCD 第⼆二種叫做Morse 1234

Page 21: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

如果踢椅⼦子的⼈人⽤用Morse ABCD encode

被踢椅⼦子的⼈人⽤用Morse 1234 decode

Page 22: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

Page 23: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

Page 24: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

Encode

左右Morse ABCD

Page 25: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

Encode

左右 左右

Transport

Morse ABCD

Page 26: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

Encode

左右 左右

DecodeTransport

Morse ABCD Morse 1234

Page 27: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

Page 28: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

HTML Documents

Encode

Bytes

Server

Big5

Page 29: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

HTML Documents

Encode

Bytes

Transport

Server Internet

Big5

Page 30: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

HTML Documents

Encode

Bytes

Transport

Server Internet

DecodeClient

Big5

UTF-8

Page 31: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

UTF-8: ⽬目前最廣泛的編碼系統 實作Unicode的⽅方法之⼀一 (Unicode 又稱萬國碼)

ASCII: 早期最廣泛的編碼系統 ⽬目前多被Unicode取代

Big5: ASCII不⽀支援中⽂文 所以早期台灣中⽂文網站會⽤用Big5

Page 32: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

⽤用哪種⽅方法encode就要⽤用哪種⽅方法decode!

Page 33: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

這樣才能成功作弊!

Page 34: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

在Chrome裡可以⽤用 檢視 > 字元編碼 來重設網⾴頁的編碼⽅方式

Page 35: PyCon 2015 Crawler Tutorial Explain Encoding

Character Encoding

http://www.angelfire.com/ok/leekawo/hacker.htm

讓這個網⾴頁變成可看的中⽂文!

Page 36: PyCon 2015 Crawler Tutorial Explain Encoding

Encoding in Python

在Python中練習 encode / decode 技巧

Page 37: PyCon 2015 Crawler Tutorial Explain Encoding

Encoding in Python

Python 的 string 以 Unicode 為標準

bytes 即為 string encode 的結果!