云计算基础分布式时间与时钟

文世挺博士浙江大学宁波理工学院 (SL605-2)

wensht@nit.zju.edu.cn

15058033236

2014.2.21

物理时间

分布式系统中时间戳作用？准确度量系统性能保证数据“ up-to-date” 和正确性并发处理程序的时序事件逻辑排序消息发送端和接收端的同步联合活动协调共享对象并发存取的串行化 ……

物理时间 Physical time

Solar time ( 太阳时 ) 1 sec = 1 day / 86400 Problem: days are of different

lengths (due to tidal friction, etc.) mean solar second: averaged over

many days

Greenwich Mean Time (GMT 格林尼治 ) The mean solar time at Royal

Observatory in Greenwich, London Greenwich located at longitude 0,

the line that divides east and west

协调世界时间 Coordinated Universal Time (UTC)国际原子时 (TAI: International atomic time) 1 秒 Cesium-133 原子发生 9,192,631,770 次状态转变 TAI time is simply the number of Cesium-133 transitions since

midnight on Jan 1, 1958. Accuracy: better than 1 second in six million years Problem: Atomic clocks do not keep in step with solar time

协调世界时间 Coordinated Universal Time (UTC) Based on the atomic time (TAI) and introduced from 1 Jan 1972 A leap second is occasionally inserted or deleted to keep in step with

solar time when the difference btw a solar-day and a TAI-day is over 800ms

计算机时钟 Computer Clocks

石英振荡器驱动 CMOS 时钟电路电脑关机情况下是通过电池供电驱动 CMOS 时

钟电路时钟电路主要由两部分组成：计数器

（ counter ）和寄存器（ register ）。石英振荡器每震荡一次，计数器减 1 ；当计数器为 0时产生一个中断，同时从寄存器重新读取数值。不停重复。。。

操作系统（ OS ）捕获中断信号维护计算机时钟 e.g., 60 or 100 中断为 1sec 可编程中断控制器 Programmable Interrupt Controller

register

counter

时钟漂移和偏移 Clock drift and clock skew 时钟漂移 Clock Drift

Clocks tick at different rates Ordinary quartz clocks drift

by ~ 1sec in 11-12 days. (10-

6 secs/sec). High precision quartz clocks

drift rate is ~ 10-7 or 10-8 secs/sec

Create ever-widening gap in perceived time

时钟偏移（补偿） Clock Skew (offset) Difference between two

clocks at one point in time

(完美时钟 )Perfect clock

Drift with a slow computer clock

Drift with a fast computer clock

Dealing with drift

No good to set a clock backward Illusion of time moving backwards can confuse

message ordering and software development environments

Go for gradual clock correction If fast: Make clock run slower until it synchronizes If slow: Make clock run faster until it synchronizes

线性补偿函数 Linear compensating function 操作系统可以调整中断的频

率 e.g.: if the system generates

an interrupt every 17 ms but clock is too slow: generates an interrupt at (e.g.) 15 ms

调整系统时间的斜率Adjustment changes slope of system time: Linear compensating function

再同步 Resynchronization

同步周期 After synchronization period is reached 周期性再同步 (Resynchronize periodically), or 当斜率达到阈值时同步 (The skew is beyond a threshold)

主动调节时间 UNIX adjtime system call:

int adjtime(struct timeval *delta, struct timeval *old-delta) adjusts the system's notion of the current time, advancing

or retarding it, by the amount of time specified in the struct timeval pointed to by delta. “old-delta”, output parameter, returns time left uncorrected since last call of “adjtime”

Getting UTC from Top Sources 在每个计算机上安装 GPS 接收器

误差： ± 1 ms of UTC 连接 WWV (http://tf.nist.gov) 无线接收器

从 Boulder or DC 获取时间广播误差： ± 3 ms of UTC (depending on distance)

安装 GOES 接收器 (Geostationary Operational Environmental Satellites, http://www.goes.noaa.gov/) 误差： ± 0.1 ms of UTC

Not practical for every machine

– Cost, size, convenience, environment

Getting UTC for Client Computers Client Computer 和 time server 同步时间

和更准确的时钟同步， or 和 UTC 时间源同步（ connected to UTC time

source ）

Also called external clock synchronization

Synchronizing Clocks by using RPC Simplest synchronization technique

Make an RPC to obtain time from the server Set the local clock to the server time

没有考虑网络延迟

serverclient

What’s the time?

10:25:18

Cristian’s algorithm

Compensate for network delays (assuming symmetric)

client sends a request at T0 server replies with the current clock value

Tserver

client receives response at T1 client sets its clock to:

201 TT

TT serverclient

Cristian’s algorithm: example Send request at 5:08:15.100 (T0) Receive response at 5:08:15.900 (T1)

Response contains 5:09:25.300 (Tserver)

Round-trip time is T1 − T0

5:08:15.900 - 5:08:15.100 = 800 ms

Best guess: timestamp was generated 400 ms ago Set the local time to Tserver + round-trip-time/2

5:09:25.300 + 400 = 5:09.25.700

Accuracy: ± round-trip-time/2

server

client T0 T1

Tserver

Cristian’s algorithm: error bound

Tmin: Minimum message travel time

Problems with Cristian’s algorithm Server might fail Subject to malicious interference

（伯克利算法） Berkeley Algorithm Proposed by Gusella & Zatti, 1989 and

implemented in BSD version of UNIX Aim: synchronize clocks of a group of machines

as close as possible (also called internal synchronization)

Assumes no machine has an accurate time source (i.e., no differentiation of client and server)

Obtains average from participating computers Synchronizes all clocks to average

（伯克利算法） Berkeley Algorithm One machine is elected (or designated) as the master;

others are slaves:

1. Master polls all slaves periodically, asking for their time Cristian’s algorithm can be used to obtain more accurate clock

values from other machines by counting network latency

2. When results are collected, compute the average Including master’s time

3. Send each slave the offset that its clock need be adjusted Avoids problems with network delays by sending “offset”

instead of “timestamp”

（伯克利算法） Berkeley Algorithm Algorithm has provisions for ignoring

readings from clocks whose skew is too large Compute a fault-tolerant average

Any slave can take over the master if master fails

Berkeley Algorithm: example

3:25 2:50 9:10

0+0:15

3. Send offset to each client

网络时间协议 Network Time Protocol (NTP) NTP 是非常常用的互联网时间协议，它的准确性也非常高 (RFC

1305, http://tf.nist.gov/service/its.htm ). 计算机操作系统需要安装 NTP 软件协议。客户端软件周期性的从一

个或多个获取时间更新（计算其平均值）时间服务器监听 NTP 协议，端口 123 ，响应 UDP/IP 协议传送一个

NTP 数据的包（ which is a 64-bit timestamp in UTC seconds since Jan 1, 1900 with a resolution of 200 pico-s ） .

Many NTP client software for PC only gets time from a single server (no averaging). The client is called SNTP (Simple Network Time Protocol, RFC 2030), a simple version of NTP.

NTP synchronization subnet

第一层组织：客户机器直接连接到准确的时间

服务器。

第二层组织：客户机器和第一层组织机器同步。

。。。

NTP goals 使得客户机可以跨越 Internet 和 UTC 进行准确同步）

（有消息延迟） Use statistical techniques to filter data and improve quality of

results 提供可靠的时间同步服务（ Provide reliable service ）

Survive lengthy losses of connectivity Redundant paths Redundant servers

使得客户端可以频繁同步 Enable clients to synchronize frequently Adjustment of clocks by using offset (for symmetric mode)

提供抗干扰能力 Provide protection against interference Authenticate source of data

NTP Synchronization Modes

Multicast (for quick LANs, low accuracy) server periodically multicasts its time to its clients in the subnet

Remote Procedure Call (medium accuracy) server responds to client requests with its actual timestamp like Cristian’s algorithm

Symmetric mode (high accuracy) used to synchronize between the time servers (peer-peer)

All messages delivered unreliably with UDP

Symmetric mode

The delay between the arrival of a request (at server B) and the dispatch of the reply is NOT negligible:

Delay = total transmission time of the two messagesdi = (Ti – Ti-3 ) – (Ti-1– Ti-2)

Offset of clock A relative to clock B: Offset of clock A:

Set clock A: Ti + oi

Accuracy bound: di /2

Server B

Server A TiTi-3

m m’

Ti-2 Ti-1

)()( 132 iiiii

1( )2i

Symmetric mode (another expression)

Delay = total transmission time of the two messagesdi = (Ti – Ti-3 ) – (Ti-1– Ti-2)

Clock A should set its time to (the best estimate of B’s time at Ti):

Ti-1 + di/2, which is the same as Ti + oi

Server B

Server A TiTi-3

m m’

Symmetric NTP example

Server B

Server ATi =1200 Ti-3 =1100

m m’

Ti-2 =800 Ti-1 =850

Offset oi=((800 – 1100) + (850 – 1200))/2 = – 325Set clock A to: Ti + oi = 1200 – 325 = 875Note: Server A need to adjusts it current clock (1200ms) by gradual slowdown its pace until -325ms is compensated.

Improving accuracy

Data filtering from a single source Retain the multiple most recent pairs < oi, di > Filter dispersion: choose oj corresponding to the smallest dj

Peer-selection: synchronize with lower stratum servers lower stratum numbers, lower synchronization dispersion

The stratum of a server is dynamically changing, depending on which server it synchronize with

Simple Network Time Protocol (SNTP) RFC 2030 Targeted for machines that have no need of full NTP

implementation, particularly for machines at the end of synchronization subnet (client nodes)

SNTP operate in one of the following modes: 单播模式（ Unicast mode ） , the client sends a request to a

designated server 组播模式（ Multicast mode ） , the server periodically

broadcast/multicast its time to the subnet and does not serve any requests from clients

任播模式（ Anycast mode ） , the client broadcast/multicast a request to the local subnet and takes the first response for time synchronization

逻辑时间

使用逻辑时间的动机

Cannot synchronize physical clocks perfectly in distributed systems. [Lamport 1978]

Main function of computer clocks – order events If two processes don’t interact, there is no need

to sync clocks. This observation leads to “causality （因果关

系）”

因果关系（ Causality ）

Order events with happened-before () relation a b

a could have affected the outcome of b a || b

a and b take place in different processes that don’t exchange data

Their relative ordering does not matter (they are concurrent)

Definition of happened-beforeDefinition of “” relationship:

1.If a and b take place in the same process a comes before b, then a b

2.If a and b take place in the different processes a is a “send” and b is the corresponding “receive”, then a b

3.Transitive: if a b and b c, then a c

Partial ordering – unordered events are concurrent

Logical Clocks

A logical clock is a monotonically increasing software counter. It need not relate to a physical clock. Corrections to a clock must be made by adding,

not subtracting

Rule for assigning “time” values to events if a b then clock(a) < clock(b)

Event counting example

Three processes: P0, P1, P2, events a, b, c, … A local event counter in each process. Processes occasionally communicate with

each other, where inconsistency occurs, …

Bad ordering: e h, f k

Lamport’s algorithm, 1978

Each process Pi has a logical clock Li. Clock synchronization algorithm:

1. Li is initialized to 0;

2. Update Li: LC1: Li is incremented by 1 for each new event happened in Pi

LC2: when Pi sends message m, it attaches t = Li to m

LC3: when Pj receives (m,t) it sets Lj := max{Lj, t} , and then applies LC1 to increment Lj for event receive(m)

Problem: Identical timestamps

Concurrent events (e.g., a, g) may have the same timestamp

Make timestamps unique

Append the process ID (or system ID) to the clock value after the decimal point: e.g. if P1, P2 both have L1 = L2 = 40, make L1 =

40.1, L2 = 40.2

Problem: Detecting causal relations If a b, then L(a) < L(b), however: If L(a) < L(b), we cannot conclude that a b It is not very useful in distributed systems.

Solution: use vector clocks

L(g) < L(c ), but g || c

Vector of Timestamps

Suppose there are a group of people and each needs to keep track of events happened to others.

Requirement: Given two events, you need to tell whether they are sequential or concurrent.

Solution: you need to have a vector of timestamps, one element for each member.

(3,0,0) (?,?,?)

Vector clocks

Each process Pi keeps a clock Vi which is a vector of N integers

Initialization: for 1 ≤ i ≤ N and 1 ≤ k ≤ N, Vi[k] := 0

Update Vi : VC1: when there is a new event in Pi, it sets Vi[i] := Vi[i] +1

VC2: when Pi sends a message m out, it attaches t = Vi to m

VC3: when Pj receives (m,t), for 1 ≤ k ≤ N, it sets Vj[k] := max{Vj[k], t[k]}, then applies VC1 to increment Vj[j] for event receive(m,t)

Note: Vi[j] is a timestamp indicating that Pi knows all events that happened in Pj upto this time.

Vector timestamps: example

Detecting “” or “||” events by time vectors Define

V = V’ iff V[i] = V’[i]) for i = 1, …, N

V ≤ V’ iff V[i] ≤ V’[i]) for i = 1, …, N

V < V’ iff V ≤ V’ and V ≠ V’

V(e) timestamp vector of an event e For any two events a and b,

a b iff V(a) < V(b), a ≠ b a || b iff neither V(a) ≤ V(b) nor V(b) ≤ V(a)

Detecting “” or “||” events: an example

Summary on vector timestamps 不需要同步物理时钟能排序随机事件能识别并发事件 (but cannot order them)

An Application of Timestamp Vectors: causally-ordered multicastMulticast: a sender sends a message to a group of receivers.

Every message must be received by all group members.

Causally ordered multicast: if m1 m2, m1 must be received before m2 by all receivers.

(0,0,0)

(0,0,0) (1,2,2)(1,0,1)

(2,2,0)

(1,2,0)

(1,1,0)

(1,0,0)

Algorithm of Causally-Ordered Multicast

Each group member keeps a timestamp vector of n components (n group members), all initialized to 0.

1. When Pi multicasts a message m, it increments i-th component of its time vector Vi and attaches Vi to m.

2. When Pj with Vj receives (m, Vi) from Pi:

if (Vj [k] Vi[k] for all k, k≠ i), then

Vj [i] := Vi [i]; // Vi [i] is always greater than Vj [i]

Vj [j] := Vj [j] + 1;

Deliver m;

otherwise

Delay m until “if” is met.

(0,0,0)

(1,2,2)

(1,0,1)

(2,2,0)

(1,2,0)

(1,1,0)

(1,0,0)

(3,2,0)

(3,3,0)

Causal-Order Preserved

If m1 m2, m1 is received by (delivered to) all recipients before m2.

If m1 || m2, m1 and m2 can be received in arbitrary order by recipients.

Total ordered multicast: for case of m1 || m2, m1 and m2 must be received in the same order by all recipients (i.e., either all m1 before m2, or all m2 before m1).

(0,0,0)

(0,0,0) (1,2,2)(1,0,1)

(2,2,0)

(1,2,0)

(1,1,0)

(1,0,0)

(1,3,0)

(3,2,0)

(3,4,0)

(3,3,4)

谢谢各位 !

云计算基础 分布式时间与时钟

Documents

Transcript of 云计算基础 分布式时间与时钟

《EDA 技术 》 学习情境 4: 可调数字时钟的 CPLD 设计 任务 1 ： 可调数字时钟的计数功能设计

新时代的分析型云数据库 Greenplum

云办公产业全解析 - img3.gelonghui.com · 个人志愿者等 1月26 日 疫情结束 全时云 全时云会议 无限次、不限时、100 方入会的免 费会议服务 中国用户

0a-esp8285 datasheet cn - Espressif€¦ · 2017.11 V1.4 新第 3 章时钟振幅范围为 0.8 ～ 1.5 V； ... 户配置 AT+ 指令集，云端服务， Android/iOS APP 分类 项

Linux 中的时钟 xlanchen@2007.6.19. Embedded Operating Systems2 定时测量 Linux 内核提供两种主要的定时测量 获得当前的时间和日期 系统调用： time(),

第十一章 复位、时钟和省电方式控制 本章学习目标 掌握单片机的时钟 掌握单片机的电源检测与控制

春季省考申论 - huatu.com注意事项 1 本题本由给定资料与作答要求两部分构成。考试时限为150 分钟。其中，阅读给定资料参考时限为40分钟，作答参考时限为

看得见的“云安全” -- 云计算时代的安全格局变化

HEALTHWEEKLY 拔罐好处多epaper.heyuan.cn/resfile/2018-09-07/07/07.pdf · 拔罐有一定危险性，每次拔罐时 间以5分钟—15分钟为宜，时间 不够起不到疗效，时间过长又会

回顾： 实验五 具有上电 / 手动清零功能的时钟显示

YunTable - 云时代的数据库

UltraScale+ Integrated 100G Ethernet Subsystem v3第3章:用核设计 时钟. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Yun table 云时代的数据库

Step1 ． Leading in (5 分钟 ) Step2. Reading （ 13 分钟） Step3. Writing （ 20 分钟） Step5. Homework (1 分钟 ) Step4. Summary (1 分钟 )

高精度实时时钟－SD2401RAM 充电系列

云时代架构和运维的新趋势 - s3.51cto.com · PDF file云时代架构和运维的新趋势 张侠博士 aws首席云计算企业顾问 北京 2017年4月14日

STM32 F0F2F3F4 L1 MCU RTC...在 STM32 F0、F2、F3、F4 和 L1 系列 MCU 中使用硬件实时时钟（RTC） 前言 实时时钟 (RTC) 是记录当前时间的计算机时钟。RTC

云计算时代的安全解决方案分论坛 确保云环境中的云安全-陆永康-Sonicwall

云计算时代的 旅游信息化

龙芯 3A3000/3B3000 处理器 数据手册¡¨ 6.29 EJTAG 的交流时序特性.....49 表 6.30 发送端时钟的不确定性.....49 表 6.31 输入时钟抖动 表 6.32 PCI-X 时钟参数

云计算基础分布式时间与时钟

Transcript of 云计算基础分布式时间与时钟

《EDA 技术》学习情境 4: 可调数字时钟的 CPLD 设计任务 1 ：可调数字时钟的计数功能设计

云办公产业全解析 - img3.gelonghui.com · 个人志愿者等 1月26 日疫情结束全时云全时云会议无限次、不限时、100 方入会的免费会议服务中国用户

0a-esp8285 datasheet cn - Espressif€¦ · 2017.11 V1.4 新第 3 章时钟振幅范围为 0.8 ～ 1.5 V； ... 户配置 AT+ 指令集，云端服务， Android/iOS APP 分类项

Linux 中的时钟 xlanchen@2007.6.19. Embedded Operating Systems2 定时测量 Linux 内核提供两种主要的定时测量获得当前的时间和日期系统调用： time(),

第十一章复位、时钟和省电方式控制本章学习目标掌握单片机的时钟掌握单片机的电源检测与控制

HEALTHWEEKLY 拔罐好处多epaper.heyuan.cn/resfile/2018-09-07/07/07.pdf · 拔罐有一定危险性，每次拔罐时间以5分钟—15分钟为宜，时间不够起不到疗效，时间过长又会

回顾：实验五具有上电 / 手动清零功能的时钟显示

UltraScale+ Integrated 100G Ethernet Subsystem v3第3章:用核设计时钟. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

云时代架构和运维的新趋势 - s3.51cto.com · PDF file云时代架构和运维的新趋势张侠博士 aws首席云计算企业顾问北京 2017年4月14日

STM32 F0F2F3F4 L1 MCU RTC...在 STM32 F0、F2、F3、F4 和 L1 系列 MCU 中使用硬件实时时钟（RTC）前言实时时钟 (RTC) 是记录当前时间的计算机时钟。RTC

云计算时代的安全解决方案分论坛确保云环境中的云安全-陆永康-Sonicwall

云计算时代的旅游信息化

龙芯 3A3000/3B3000 处理器数据手册¡¨ 6.29 EJTAG 的交流时序特性.....49 表 6.30 发送端时钟的不确定性.....49 表 6.31 输入时钟抖动表 6.32 PCI-X 时钟参数