In the beginning was the Word...

36
In the beginning was the Word... 情情情情 情情情情 情情情情情情情 :, 情情情情情情情情情情情情情情情情情情情情情情情情情情情 Information Theory: English and Japanese, alternate years the course will be taught in Japanese in this year video-recorded English classes Lecture Archives 2011 Slides are in English this slide can be found at http ://apal.naist.jp/~kaji/lecture / 1

description

In the beginning was the Word. 情報理論:日本語,英語で隔年開講 今年度は日本語 で授業を行う が , スライドは英語 のものを使用 Information Theory: English and Japanese, alternate years the course will b e taught in Japanese in this year video-recorded English classes  Lecture Archives 2011 Slides are in English - PowerPoint PPT Presentation

Transcript of In the beginning was the Word...

Page 1: In the beginning was the  Word...

In the beginning was the Word...

情報理論:日本語,英語で隔年開講今年度は日本語で授業を行うが,スライドは英語のものを使用

Information Theory: English and Japanese, alternate yearsthe course will be taught in Japanese in this yearvideo-recorded English classes Lecture Archives 2011Slides are in English

this slide can be found at http://apal.naist.jp/~kaji/lecture/test questions are given in both of Japanese and English

1

Page 2: In the beginning was the  Word...

Information Theory

Information Theory (情報理論)is founded by C. E. Shannon in 1948focuses on mathematical theory of communicationgave essential impacts on today’s digital technology

wired/wireless communication/broadcastingCD/DVD/HDDdata compressioncryptography, linguistics, bioinformatics, games, ...

In this class, we learn basic subjects of information theory.(half undergraduate level + half graduate school level)

2

Claude E. Shannon1916-2001

Page 3: In the beginning was the  Word...

class plan

This class consists of four chapters (+ this introduction):chapter 0: the summary and the schedule of this course

(today)

chapter 1: measuring informationchapter 2: compact representation of informationchapter 3: coding for noisy communicationchapter 4: cryptography

3

Page 4: In the beginning was the  Word...

what’s the problem?

To understand our problem, date back to 1940s...Teletype (電信 ) was widely used for communication.Morse code: dots ( ) and dashes ( − )∙

4

They already had “digital communication”.

10111000111000000010101010001110111011100011101110001

dot = 1 unit long, dash = 3 units long1 unit silence between marks3 units silence between letters, etc.

Page 5: In the beginning was the  Word...

machinery for information processing

No computers yet, but there were “machines”...

5

They could do something complicated.The transmission/recording of messages were...

inefficient...messages should be as short as possibleunreliable...messages are often disturbed by noises

The efficiency and the reliability were two major problems.

Teletype model 14-KTR, 1940http://www.baudot.net/teletype/M14.htm

Enigma machinehttp://enigma.wikispaces.com/

Page 6: In the beginning was the  Word...

the model of communication

A communication system can be modeled as;

6

C.E. Shannon, A Mathematical Theory of Communication,The Bell System Technical Journal, 27, pp. 379–423, 623–656, 1948.

encoder,modulator,codec, etc...

channel,storage medium,etc...

Page 7: In the beginning was the  Word...

what is the “efficiency”?

A communication is efficient if the size of B is small.subject to A = D, or A ≈ Dwith, or without noise (B ≠ C, or B = C)

7

A B C D

Page 8: In the beginning was the  Word...

problem one: efficiency

Example: You need to record the weather of Tokyo everyday.weather = {sunny, cloudy, rainy}You can use “0” and “1”, but you cannot use blank spaces.

8

weathersunnycloudyrainy

codeword000110

2-bit record everyday200 bits for 100 days

0100011000 Can we shorten the representation?

Page 9: In the beginning was the  Word...

better code?

The code B gives shorter representation than the code A.Can we decode the code B correctly?

Yes, as far as the sequence is processed from the beginning.

Is there a code which is more compact than this code B?No, and Yes(→ next slide).

9

weathersunnycloudyrainy

code A000110

code B00011 code A...0100011000

code B...010001100

Page 10: In the beginning was the  Word...

think average

Sometimes, events are not equally likely...

10

with the code A, 2.0 bit / event (always)with the code B,

20.5 + 20.3 + 10.2 = 1.8 bit / event in averagewith the code C,

10.5 + 20.3 + 20.2 = 1.5 bit / event in average

weathersunnycloudyrainy

probability0.50.30.2

code A000110

code B00011

code C1

0100

Page 11: In the beginning was the  Word...

the best code?

Can we represent information with 0.00000000001 bit per event?...No, maybe.

It is likely that there is a “limit” which we cannot get over.Shannon investigated the limit mathematically.

→ For this event set, we need 1.485 or more bit per event.

11

weathersunnycloudyrainy

probability0.50.30.2

This is the amount of informationwhich must be carried by the code.

Page 12: In the beginning was the  Word...

class plan in April

chapter 0: the summary and the schedule of this course

chapter 1: measuring informationWe establish a mathematical means to measure

information in a quantitative manner.chapter 2: compact representation of information

We learn several coding techniques which give compact representation of information.

chapter 3: coding for noisy communicationchapter 4: cryptography

12

Page 13: In the beginning was the  Word...

what is the “reliability”?

A communication is reliable if A = D or A ≈ D.the existence of noise is essential (B ≠ C)How small can we make the size of B?

13

A B C D

Page 14: In the beginning was the  Word...

problem two: reliability

Communication is not always reliable.transmitted information ≠ received information

14

Errors of this kind are unavoidable in real communication.

In the usual conversation, we sometimes use phonetic codes.

ABCABC ABCADC

ABC Alpha, Bravo, Charlie

ABCAlpha, Bravo, Charlieあさひの「あ」いろはの「い」

Page 15: In the beginning was the  Word...

phonetic code

A phonetic code adds redundant information.The redundant part helps correcting possible errors.

→use this mechanism over 0-1 data, and we can correct errors!

15

Alphathe real

informationredundant (冗長な ) informationfor correcting possible errors

Page 16: In the beginning was the  Word...

redundancy

Q. Can we add “redundancy” to binary data?A. Yes, use parity bits.

A parity bit is...a binary digit which is to make the number of 1’s in data even.

00101 → 001010 (two 1’s → two 1’s)11010 → 110101 (three 1’s → four 1’s)

One parity bit may tell you that there are odd numbers of errors,but not more than that.

16

Page 17: In the beginning was the  Word...

to correct error(s)

basic idea: use several parity bits to correct errors

Example: Add five parity bits to four-bits data (a0, a1, a2, a3).

17

This code corrects one-bit error,but it is too straightforward.

a0

a2

a1

a3

p0

p1

q0 q1 r

codeword =(a0, a1, a2, a3, p0, p1, q0, q1, r)

Page 18: In the beginning was the  Word...

class plan in May

chapter 0: the summary and the schedule of this coursechapter 1: measuring informationchapter 2: compact representation of information

chapter 3: coding for noisy communicationWe study practical coding techniques for finding and correcting errors.

chapter 4: cryptographyWe review techniques for protecting information from intensive attacks.

18

Page 19: In the beginning was the  Word...

schedule

April

19

Tue101724

Thu121926

May 0108152229

0310172431

June

××

test:questions given in English/Japanese

report (quiz):will be assigned bythe end of April

(Mon)

04 05

×

statistics in 2011: A ... 51 / B ... 20 / C ... 18 / did not pass ... 13

Page 20: In the beginning was the  Word...

chapter 1:measuring information

20

Page 21: In the beginning was the  Word...

motivation

“To tell plenty of things, we need more words.”...maybe true, but can you give the proof of this statement?

We will need to...1. measure information quantitatively (定量的に測る )2. observe the relation between the amount of information

and its representation.

Chapter 1 focuses on the first step above.

21

Page 22: In the beginning was the  Word...

the uncertainty (不確実さ )

Information tells what has happened at the information source.Before you receive information, there is much uncertainty.After you receive information, the uncertainty becomes small.

the difference of uncertainty the amount of informationFIRST, we need to measure the uncertainty of information source.

22

muchuncertainty

Before

smalluncertaintyAfter

this difference indicatesthe amount of information

Page 23: In the beginning was the  Word...

the definition of uncertainty

The uncertainty is defined according to the statistics (統計量 ),

BUT,we do not have enough time today....

In the rest of today’s talk,we study two typical information sources.

memoryless & stationary information sourceMarkov information source

23

Page 24: In the beginning was the  Word...

assumption

In this class, we assume that...an information source produces one symbol per unit time

(discrete-time information source)the set of possible symbols is finite and countable (有限可算 )

(digital information source)

Note however that, in the real world,there are continuous-time and/or analogue information sources.

cf. sampling & quantization

24

Page 25: In the beginning was the  Word...

Preliminary (準備 )

Assume a discrete-time digital information source S:M = {a1, ..., ak}... the set of symbols of S

(S is said to be a k-ary information source.)Xt...the symbol which S produces at time t

The sequence X1, ..., Xn is called a message produced by S.

Example: S = fair dice

25

   

if the message is , then

   

Page 26: In the beginning was the  Word...

memoryless & stationary information source

A memoryless & stationary information source satisfies...memoryless condition:

“A symbol is chosen independently from past symbols.”stationary condition: for any t

“The probability distribution is time invariant.”

26

trial 1trial 2trial 3

:

123456...ajcgea...gajkfh...wasdas...

:

the same probability distribution

memoryless = 無記憶stationary = 定常

Page 27: In the beginning was the  Word...

memoryless & stationary information source

Examples of memoryless & stationary information source:the “dice” example, coin toss, ...

information sources with memory:English text: wireless communication...burst noise

not-stationary information sources:weather...P(snow) is large in winterand more?

27

Page 28: In the beginning was the  Word...

Markov information source

Markov information sourcea simple model of information source with memoryThe choice of the next symbol depends on

at most m previous symbols(m-th order Markov source)

28

Andrey Markov1856-1922

𝑃 𝑋 𝑡∨𝑋 1 …𝑋 𝑡− 1(𝑎𝑡|𝑎1 …𝑎𝑡 −1 )=𝑃 𝑋 𝑡∨𝑋 𝑡−𝑚 …𝑋 𝑡 −1

(𝑎𝑡|𝑎𝑡 −𝑚…𝑎𝑡− 1)

m = 0 memoryless sourcem = 1 simple Markov source

Page 29: In the beginning was the  Word...

Example of (simple) Markov source

S ... memoryless & stationary source with P(0) = q, P(1) = 1 – q

29

S

R1-bit register

Xt

if Xt–1 = 0, then R = 0:

S = 0 Xt = 0 ... PXt|Xt–1(0 | 0) = q

S = 1 Xt = 1 ... PXt|Xt–1(1 | 0) = 1 – q

if Xt–1 = 1, then R = 1:

S = 0 Xt = 1 ... PXt|Xt–1(1 | 1) = q

S = 1 Xt = 0 ... PXt|Xt–1(0 | 1) = 1 – q

Page 30: In the beginning was the  Word...

0 1

1 / 1–q

0 / 1–q

0 / q 1 / q

Markov source as a finite state machine

m-th order k-ary Markov source:The next symbols depends on previous m symbols.The model is having one of km internal states.The state changes when a new symbol is generated.

finite state machine

30

S

R1-bit register

Xt

generatedsymbol probability

Page 31: In the beginning was the  Word...

31

two important properties

irreducible (既約 ) Markov source:We can move to any state from any state.

A

B C

this example is NOT irreducible

aperiodic (非周期的 ) Markov source:We have no periodical behavior (strict discussion needed...).

A Bthis example is NOT aperiodic

irreducible + aperiodic = regular

Page 32: In the beginning was the  Word...

32

example of the regular Markov source

A B

0/0.9 1/0.1

0/0.8 1/0.2

converge (収束する ) tothe same probabilities

stationary probabilities

timeP (state=A)P (state=B)

11.00.0

20.90.1

30.890.11

40.8890.111

...

...

...

start from the state 0

timeP (state=A)P (state=B)

10.01.0

20.80.2

30.880.12

40.8880.112

...

...

...

start from the state 1

Page 33: In the beginning was the  Word...

33

computation of the stationary probabilities

t : P(state = A) at time t

t : P(state = B) at time t

t+1 = 0.9t + 0.8t

t+1 = 0.1t + 0.2t

t+1+ t+1= 1

A B

0/0.9 1/0.1

0/0.8 1/0.2

If t and t converge to and , respectively, then

we can put t+1=t= and t+1=t=.

= 0.9 + 0.8 = 0.1 + 0.2

+= 1

=8/9, =1/9

Page 34: In the beginning was the  Word...

34

Markov source as a stationary source

After enough time has elapsed...a regular Markov source can be regarded as a stationary source

A B

0/0.9 1/0.1

0/0.8 1/0.2

=8/9, =1/9

0 will be produced with probability P(0) = 0.9 + 0.8 = 0.8891 will be produced with probability P(1) = 0.1 + 0.2 = 0.111

Page 35: In the beginning was the  Word...

35

summary of today’s class

overview of this coursemotivationfour chapters

typical information sourcesmemoryless & stationary sourceMarkov source

Page 36: In the beginning was the  Word...

36

exercise

Determine the stationary probabilities.Compute the probability that 010 is produced.

A

B C

0/0.4 0/0.5 1/0.6

0/0.8 1/0.5

1/0.2

This is to check your understanding.This is not a report assignment.