Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院...
-
Upload
ira-allison -
Category
Documents
-
view
272 -
download
0
Transcript of Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院...
![Page 1: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/1.jpg)
Introduction to Distributed Systems
http://net.pku.edu.cn/~course/cs402 闫宏飞
北京大学信息科学技术学院7/22/2010
![Page 2: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/2.jpg)
Introduction to Distributed System
Parallelization & Synchronization
![Page 3: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/3.jpg)
Parallelization Idea (2)
w1 w2 w3
thread thread thread
Spawn worker threads:
![Page 4: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/4.jpg)
Parallelization Idea (3)
thread thread thread
Workers process data:
w1 w2 w3
![Page 5: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/5.jpg)
Parallelization Idea (4)
results
Report results
thread thread threadw1 w2 w3
![Page 6: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/6.jpg)
Process synchronization refers to the coordination of simultaneous threads
or processes to complete a task in order to get correct runtime order and
avoid unexpected race conditions.
Process synchronization refers to the coordination of simultaneous threads
or processes to complete a task in order to get correct runtime order and
avoid unexpected race conditions.
![Page 7: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/7.jpg)
And if you thought I was joking…
![Page 8: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/8.jpg)
What is Wrong With This?
Thread 1:void foo() { x++; y = x;}
Thread 2:void bar() { y++; x++;}
If the initial state is y = 0, x = 6, what happens after these threads finish running?
![Page 9: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/9.jpg)
Multithreaded = Unpredictability
When we run a multithreaded program, we don’t know what order threads run in, nor do we know when they will interrupt one another.
Thread 1:void foo() { eax = mem[x]; inc eax; mem[x] = eax; ebx = mem[x]; mem[y] = ebx;}
Thread 2:void bar() { eax = mem[y]; inc eax; mem[y] = eax; eax = mem[x]; inc eax; mem[x] = eax;}
Many things that look like “one step” operations actually take several steps under the hood:
![Page 10: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/10.jpg)
Multithreaded = Unpredictability
This applies to more than just integers:
Pulling work units from a queue Reporting work back to master unit Telling another thread that it can begin the
“next phase” of processing
… All require synchronization!
![Page 11: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/11.jpg)
Synchronization Primitives
Semaphore BarriersLock
Thread 1:void foo() { sem.lock(); x++; y = x; sem.unlock();}
Thread 2:void bar() { sem.lock(); y++; x++; sem.unlock();}
![Page 12: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/12.jpg)
Too Much Synchronization? Deadlock
Thread A:Thread A:semaphore1.lock();semaphore2.lock();/* use data guarded by semaphores */semaphore1.unlock(); semaphore2.unlock();
Thread B:semaphore2.lock();semaphore1.lock();/* use data guarded by semaphores */semaphore1.unlock(); semaphore2.unlock();
(Image: RPI CSCI.4210 Operating Systems notes)
![Page 13: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/13.jpg)
And if you thought I was joking…
![Page 14: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/14.jpg)
The Moral: Be Careful!
Synchronization is hard Need to consider all possible shared state Must keep locks organized and use them
consistently and correctly Knowing there are bugs may be tricky;
fixing them can be even worse! Keeping shared state to a minimum
reduces total system complexity
![Page 15: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/15.jpg)
Introduction to Distributed System
Fundamentals of Networking
![Page 16: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/16.jpg)
TCP
UDP
IPROUTER
SWITCH
HTTP
FTP
PORTSOCKET
Protocol
Firewall
Gateway
![Page 17: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/17.jpg)
What makes this work?
Underneath the socket layer are several more protocols
Most important are TCP and IP (which are used hand-in-hand so often, they’re often spoken of as one protocol: TCP/IP)
Your dataTCP header
IP header
![Page 18: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/18.jpg)
TCP/IP协议发明人 Cerf 及 Kahn 获 2004'图灵奖
Vinton G. Cerf Robert E. Kahn
![Page 19: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/19.jpg)
Why is This Necessary?
Not actually tube-like “underneath the hood” Unlike phone system (circuit switched), the
packet switched Internet uses many routes at once
you www.google.com
![Page 20: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/20.jpg)
Networking Issues
If a party to a socket disconnects, how much data did they receive?
Did they crash? Or did a machine in the middle?
Can someone in the middle intercept/modify our data?
Traffic congestion makes switch/router topology important for efficient throughput
![Page 21: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/21.jpg)
Introduction to Distributed System
Distributed Systems
![Page 22: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/22.jpg)
Outline
Models of computation Connecting distributed modules Failure & reliability
![Page 23: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/23.jpg)
Models of Computation
Instructions
Single (SI) Multiple (MI)
Da
ta
Mu
ltip
le (
MD
)
SISDsingle-
threaded process
MISDpipeline
architecture
SIMDvector
processing
MIMDmulti-
threaded processes
Sin
gle
(S
D)
Flynn’s Taxonomy
![Page 24: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/24.jpg)
SISD
DD DD DD DD DD DD DD
Processor
Instructions
![Page 25: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/25.jpg)
SIMD
D0D0
Processor
Instructions
D0D0D0
D0 D0D0 D0
D0 D0D0
D1D1
D2D2
D3D3
D4D4
……
DnDn
D1D1
D2D2
D3D3
D4D4
……
DnDn
D1D1
D2D2
D3D3
D4D4
……
DnDn
D1D1
D2D2
D3D3
D4D4
……
DnDn
D1D1
D2D2
D3D3
D4D4
……
DnDn
D1D1
D2D2
D3D3
D4D4
……
DnDn
D1D1
D2D2
D3D3
D4D4
……
DnDn
D0D0
![Page 26: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/26.jpg)
MIMD
DD DD DD DD DD DD DD
Processor
Instructions
DD DD DD DD DD DD DD
Processor
Instructions
![Page 27: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/27.jpg)
Memory(Instructions and Data)
Memory(Instructions and Data)
ProcessorProcessor
Instructions Data
Interface to external world
![Page 28: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/28.jpg)
Memory(Instructions and Data)
Memory(Instructions and Data)
Processor
Processor
Instructions Data
Interface to external world
Processor
Processor
InstructionsData
Interface to external world
Processor
Processor
Instructions Data
Interface to external world
Processor
Processor
InstructionsData
Interface to external world
![Page 29: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/29.jpg)
Memory(Instructions and Data)
Memory(Instructions and Data)
Processor
Processor
Instructions Data
Interface to external world
Processor
Processor
InstructionsData
Interface to external world
Processor
Processor
Instructions Data
Interface to external world
Processor
Processor
InstructionsData
Interface to external world
Memory(Instructions and Data)
Memory(Instructions and Data)
Memory(Instructions and Data)
Memory(Instructions and Data)
Memory(Instructions and Data)
Memory(Instructions and Data)
Network
![Page 30: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/30.jpg)
Memory(Instructions and Data)
Memory(Instructions and Data)
Processor
Processor
Instructions Data
Interface to external world
InstructionsData
Network
Processor
Processor
Memory(Instructions and Data)
Memory(Instructions and Data)
Processor
Processor
Instructions Data
Interface to external world
InstructionsData
Processor
Processor
Memory(Instructions and Data)
Memory(Instructions and Data)
Processor
Processor
Instructions Data
Interface to external world
InstructionsData
Processor
Processor
Memory(Instructions and Data)
Memory(Instructions and Data)
Processor
Processor
Instructions Data
Interface to external world
InstructionsData
Processor
Processor
![Page 31: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/31.jpg)
Outline
Models of computation Connecting distributed modules Failure & reliability
![Page 32: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/32.jpg)
System Organization
Having one big memory would make it a huge bottleneck Eliminates all of the parallelism
![Page 33: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/33.jpg)
CTA: Memory is Distributed
CPU
Local cache
Local RAM
DI
CPU
Local cache
Local RAM
CPU
Local cache
Local RAM
CPU
Local cache
Local RAM
interface interface interface interface
Interconnect network
DI DI DI
![Page 34: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/34.jpg)
Interconnect Networks
Bottleneck in the CTA is transferring values from one local memory to another
Interconnect network design very important; several options are available
Design constraint: How to minimize interconnect network usage?
![Page 35: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/35.jpg)
“Massively parallel architectures” start rising in prominence
Message Passing Interface (MPI) and other libraries developed
Bandwidth was a big problem For external interconnect networks in particular
A Brief History… 1985-95
![Page 36: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/36.jpg)
A Brief History… 1995-Today
Cluster/grid architecture increasingly dominant
Special node machines eschewed in favor of COTS technologies
Web-wide cluster software Companies like Google take this to the
extreme (10,000 node clusters)
![Page 37: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/37.jpg)
More About Interconnects
Several types of interconnect possible Bus Crossbar Torus Tree
![Page 38: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/38.jpg)
Interconnect Bus
P
Common bus
P P P P
•Simplest possible layout
•Not realistically practical
•Too much contention
•Little better than “one big memory”
![Page 39: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/39.jpg)
Crossbar
P P P P P
Crossbarcircuit
•All processors have “input” and “output” lines
•Crossbar connects any input to any output
•Allows for very low contention, but lots of wires, complexity
•Will not scale to many nodes
![Page 40: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/40.jpg)
Toroidal networks
P P P P
P P P P
P P P P
P P P P
A 1-dimensional torus
A 2-dimensional torus
Nodes are connected to their logical neighbors
Node-node transfer may include intermediaries
Reasonable trade-off for space/scalability
![Page 41: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/41.jpg)
Tree
P P P
S
P P P
S
S
• Switch nodes transfer data “up” or “down” the tree
• Hierarchical design keeps “short” transfers fast, incremental cost to longer transfers
• Aggregate bandwidth demands often very large at top
• Most natural layout for most cluster networks today
![Page 42: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/42.jpg)
Outline
Models of computation Connecting distributed modules Failure & reliability
![Page 43: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/43.jpg)
“A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable” -- Leslie Lamport’s research
contributions have laid the foundations of the theory of distributed systems.
logical clockByzantine failuresPaxos for consensus….
![Page 44: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/44.jpg)
Reliability Demands
Support partial failure Total system must support graceful decline in
application performance rather than a full halt Data Recoverability
If components fail, their workload must be picked up by still-functioning units
Individual Recoverability Nodes that fail and restart must be able to
rejoin the group activity without a full group restart
![Page 45: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/45.jpg)
Reliability Demands
Consistency Concurrent operations or partial internal failures
should not cause externally visible nondeterminism
Scalability Adding increased load to a system should not
cause outright failure, but a graceful decline Increasing resources should support a
proportional increase in load capacity Security
The entire system should be impervious to unauthorized access
Requires considering many more attack vectors than single-machine systems
![Page 46: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/46.jpg)
Ken Arnold, CORBA designer:
“Failure is the defining difference between distributed and local programming”
![Page 47: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/47.jpg)
Component Failure
Individual nodes simply stop
![Page 48: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/48.jpg)
Data Failure
Packets omitted by overtaxed router Or dropped by full receive-buffer in kernel Corrupt data retrieved from disk or net
![Page 49: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/49.jpg)
Network Failure
External & internal links can die Some can be routed around in ring or mesh
topology Star topology may cause individual nodes to
appear to halt Tree topology may cause “split” Messages may be sent multiple times or not at
all or in corrupted form…
![Page 50: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/50.jpg)
Timing Failure
Temporal properties may be violated Lack of “heartbeat” message may be interpreted
as component halt Clock skew between nodes may confuse
version-aware data readers
![Page 51: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/51.jpg)
Byzantine Failure
Difficult-to-reason-about circumstances arise Commands sent to foreign node are not
confirmed: What can we reason about the state of the system?
![Page 52: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/52.jpg)
Malicious Failure
Malicious (or maybe naïve) operator injects invalid or harmful commands into system
![Page 53: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/53.jpg)
Correlated Failures
Multiple CPUs/hard drives from same manufacturer lot may fail together
Power outage at one data center may cause demand overload at failover center
![Page 54: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/54.jpg)
Preparing for Failure
Distributed systems must be robust to these failure conditions
But there are lots of pitfalls…
![Page 55: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/55.jpg)
The Eight Design Fallacies
The network is reliable. Latency is zero. Bandwidth is infinite. The network is secure. Topology doesn't change. There is one administrator. Transport cost is zero. The network is homogeneous.
-- Peter Deutsch and James Gosling, Sun Microsystems
![Page 56: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/56.jpg)
Dealing With Component Failure
Use heartbeats to monitor component availability
“Buddy” or “Parent” node is aware of desired computation and can restart it elsewhere if needed
Individual storage nodes should not be the sole owner of data Pitfall: How do you keep replicas consistent?
![Page 57: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/57.jpg)
Dealing With Data Failure
Data should be check-summed and verified at several points Never trust another machine to do your data
validation! Sequence identifiers can be used to ensure
commands, packets are not lost
![Page 58: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/58.jpg)
Dealing With Network Failure
Have well-defined split policy Networks should routinely self-discover topology Well-defined arbitration/leader election
protocols determine authoritative components Inactive components should gracefully clean
up and wait for network rejoin
![Page 59: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/59.jpg)
Dealing With Other Failures
Individual application-specific problems can be difficult to envision
Make as few assumptions about foreign machines as possible
Design for security at each step
![Page 60: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/60.jpg)
TPS: Definition
A system that handles transactions coming from several sources concurrently
Transactions are “events that generate and modify data stored in an information system for later retrieval”*
To be considered a transaction processing system the computer must pass the ACID test.
* http://en.wikipedia.org/wiki/Transaction_Processing_System
![Page 61: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/61.jpg)
Key Features of TPS: ACID
“ ACID” is the acronym for the features a TPS must support:
Atomicity – A set of changes must all succeed or all fail
Consistency – Changes to data must leave the data in a valid state when the full change set is applied
Isolation – The effects of a transaction must not be visible until the entire transaction is complete
Durability – After a transaction has been committed successfully, the state change must be permanent.
![Page 62: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/62.jpg)
Atomicity & Durability
What happens if we write half of a transaction to disk and the power goes out?
![Page 63: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/63.jpg)
Logging: The Undo Buffer
1. Database writes to log the current values of all cells it is going to overwrite
2. Database overwrites cells with new values3. Database marks log entry as committed
If db crashes during (2), we use the log to roll back the tables to prior state
![Page 64: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/64.jpg)
Consistency: Data Types
Data entered in databases have rigorous data types associated with them, and explicit ranges
Does not protect against all errors (entering a date in the past is still a valid date, etc), but eliminates tedious programmer concerns
![Page 65: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/65.jpg)
Consistency: Foreign Keys
Purchase_idPurchaser_nameItem_purchased FOREIGN
Item_idItem_nameCost
Database designers declare that fields are indices into the keys of another table
Database ensures that target key exists before allowing value in source field
![Page 66: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/66.jpg)
Isolation
Using mutual-exclusion locks, we can prevent other processes from reading data we are in the process of writing
When a database is prepared to commit a set of changes, it locks any records it is going to update before making the changes
![Page 67: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/67.jpg)
Faulty Locking
Lock (A)
Unlock (A)
Lock (B)
Unlock (B)
Unlock (A)
Lock (A)
Write to table A
Read from A
Write to table B
time
Locking alone does not ensure isolation!
Changes to table A are visible before changes to table B – this is not an isolated transaction
![Page 68: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/68.jpg)
Two-Phase Locking
After a transaction has released any locks, it may not acquire any new locks
Effect: The lock set owned by a transaction has a “growing” phase and a “shrinking” phase
Lock (A)
Unlock (A)
Lock (B)
Unlock (B)
Unlock (A)
Lock (A)
Write to table A
Read from A
Write to table B
time
![Page 69: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/69.jpg)
Relationship to Distributed Comp
At the heart of a TPS is usually a large database server
Several distributed clients may connect to this server at points in time
Database may be spread across multiple servers, but must still maintain ACID
![Page 70: Introduction to Distributed Systems course/cs402 闫宏飞 北京大学信息科学技术学院 7/22/2010.](https://reader033.fdocument.pub/reader033/viewer/2022061418/5697c01d1a28abf838cd0b66/html5/thumbnails/70.jpg)
Conclusions
Parallel systems evolved toward current distributed systems usage
Hard to avoid failure Determine what is reasonable to plan for Keep protocols as simple as possible Be mindful of common pitfalls