Chapter 11 Grid Concurrency Control

29
Chapter 11 Grid Concurrency Control 11.1 A Grid Database Environment 11.2 An Example 11.3 Grid Concurrency Control (GCC) 11.4 Correctness of GCC 11.5 Features of GCC Protocol 11.6 Summary 11.7 Bibliographical Notes 11.8 Exercises

description

Chapter 11 Grid Concurrency Control. 11.1 A Grid Database Environment 11.2 An Example 11.3 Grid Concurrency Control (GCC) 11.4 Correctness of GCC 11.5 Features of GCC Protocol 11.6 Summary 11.7 Bibliographical Notes 11.8 Exercises. Grid Concurrency Control. - PowerPoint PPT Presentation

Transcript of Chapter 11 Grid Concurrency Control

Page 1: Chapter 11 Grid Concurrency Control

Chapter 11

Grid Concurrency

Control

11.1 A Grid Database Environment 11.2 An Example11.3 Grid Concurrency Control (GCC)11.4 Correctness of GCC11.5 Features of GCC Protocol11.6 Summary11.7 Bibliographical Notes11.8 Exercises

Page 2: Chapter 11 Grid Concurrency Control

Grid Concurrency Control Concurrency control protocol helps to maintain the consistency of data

in database

Concurrency control protocol addresses ‘C’ and ‘I’ of ACID properties

Serializability in the most widely accepted correctness criterion

Different DB architecture needs different concurrency control protocol, i.e. concurrency control protocol for a centralized DBMS will be different that that of a distributerd DBMS

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 3: Chapter 11 Grid Concurrency Control

11.1 A Grid Database Environment Data is geographically distributed in Grid

environment. A typical working of database in Grid architecture is shown in the figure

T2

T2

T1

ST12 ST22 ST23

DB1

Grid Middleware

DB2 DB3

ST13

T1

Legend:

T1: Transaction 1 T2: Transaction 2 ST ij: Subtransaction of

transaction i at site j

A distributed grid DB with 3 sites are shown, DB1, DB2, and DB3 (connected via grid middleware)

Transactions can be submitted at any site and may need to access data from all the sites

Originator / coordinator is a site where transaction is submitted Transactions T1 and T2 submitted to DB1 and they needs to access

data from DB2 and DB3 as well Transaction and site identifiers are suffixed, e.g. T1 will have sub-

transactions ST12 & ST13; and T2 will have sub-transactions ST21 and ST22

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 4: Chapter 11 Grid Concurrency Control

11.1 A Grid Database Environment (Cont’d) Data access must be synchronized to maintain correctness of data

Global lock tables, global logs etc cannot be implemented in Grid environment

Different DB sites may implement different concurrency control procols, e.g. one site may use locking whereas other site may use optimistic concurrency control protocol

This situation is unavoidable in Grid architecture due to heterogeneous DB sites

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 5: Chapter 11 Grid Concurrency Control

11.2 An Example

Following example shows that using traditional concurrency control protocols in the Grid environment may potentially corrupt the data

Example Consider four data objects are stored in two databases DB2 and

DB3:DB2 = O1 and O2DB3 = O3 and O4

Two transactions are submitted to the database DB1, as shown below:T1 = r1(O1) r1(O2) w1(O3) w1(O1) C1T2 = r2(O1) r2(O3) w2(O4) w2(O1) C2

The transactions are submitted to the Grid middleware and the metadata service forms required sub-transactions as follows: Sub-transactions of T1:

ST12 = r12(O1) r12(O2) w12(O1) C12(11.1)

ST13 = w13(O3) C13 (11.2)

Sub-transactions of T2:ST22 = r22(O1) w22(O1) C22

(11.3)ST23 = r23(O3) w23(O4) C23

(11.4) D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 6: Chapter 11 Grid Concurrency Control

11.2 An Example (Cont’d)

The sub-transactions are submitted to respective sites, i.e. ST12 and ST22 are submitted to DB2 and ST13 and ST23 are submitted to DB3

As all DB sites are autonomous and hence schedules/histories are created independently. Say DB2 create following history: H2 = r12(O1) r12(O2) w12(O1)C12 r22(O1) w22(O1) C22 (11.5)

and DB3 creates following history: H3 = r23(O3) w23(O4) C23 w13(O3) C13 (11.6)

From equation 11.5 serializability order: T1 execute before T2 and from equation 11.6 serializability order: T2 executes before T1

Though there is no problem in executing histories H2 and H3 in isolation, but when both histories are combined then serilaizability graph produces a cycle T1 T2 T1

Traditional distributed DB handles this situation by implementing a global management, which is not possible in Grid Databases. Next, Grid Concurrency Control protocol is discussed

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 7: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) The above example is the motivation for GCC; where, though individual

sites generate serializable schedules, in global view of things the transactions may be ordered incorrectly

Functions required by GCC: DB_Accessed(T): takes the global transaction as argument and returns set of

databases where sub-transactions of the global transaction are submitted Split_Trans(T): takes the global transaction as argument and returns a set of sub-

transactions Active_Trans(DB): takes the database as an argument and returns the set of

global transactions having any sub-transaction running in the database Cardinality (Any Set): takes any set, e.g. set of databases or set of sub-

transactions and returns the number of elements in the set Append_TS (Subtransaction): takes the sub-transaction as an argument and

attaches a unique timestamp to it. Sub-transactions of same global transaction will have same timestamp value

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 8: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

Grid Serializability Theorem Traditional Conflict Serializability is not sufficient to ensure consistency

in Grid database environment Grid serializability theorem is needed to ensure correctness of data Global transactions can be classified in 2 categories:

Global transactions with only one sub-transaction and Global transaction having more than one sub-transaction

Total order is defined as below:

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 9: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

In traditional serializability theory, serial history is considered correct. On the same ground Grid-serial history is considered correct in Grid architecture

Grid serial history is defined as below:

Condition (1) of definition 11.2 is very strict and does not allow interleaving of operations

Hence a more practical approach, Grid Serializable history is used

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 10: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

Grid serializable history:

Grid serializability is analysed by the grid serializability graph

If the graph is acyclic the history is Grid serializable

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 11: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

Grid Serializability graph is defined as below:

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 12: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d) Condition (1) considers local transactions in Grid Serializability graph Condition (2) only considers those global transactions having more

than one subtransaction Condition (3) shows the arc between conflicting transactions Grid serializability graph is stored at local sites as there is no global

management layer Following types of conflicts are possible:

Conflict between global transactions (global-global conflict) Conflict between global transaction and local transaction (global-local conflict) Conflict between local transactions (local-local conflict)

Acyclic Grid-serializability graph is used to resolve global-local conflict Total-order is used to resolve global-global conflict

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 13: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

Based on the Grid serializability graph and total order Grid serializability theorem is as follows:

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 14: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

Example of Grid serializability graph: In addition to the global transaction (in earlier example), consider

additional local transactions as follows: Local Transactions. (LT12 is read as local transaction 1 at database

site DB2):LT12 = lr12(O1) lw12(O2) lC12

LT13 = lw13(O3) lC13

Now consider following modified histories:

H2 = lr12(O1) r12(O1) r12(O2) w12(O1)C12 r22(O1) w22(O1) lw12(O2) C22 lC12

H3 = r23(O3) w23(O4) lw13(O3) C23 w13(O3) C13 lC13

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 15: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

Following figure shows the Grid serializability graph at sites DB2 and DB3

Three possible types of conflicts are discussed below:

ST12 ST22

LT12

ST13 ST23

LT13

At site DB2 At site DB3

Global-global conflict: At site DB2, ST12 precedes ST22 (i.e. T1 precedes T2) and at site DB3, ST23 precedes ST13 (i.e. T2 precedes T1). Thus a cycle is formed at different sites. And it may be impossible to identify the cycle without a global management layer. Total order used in Grid serializability avoids formation of cycles are distributed sites

Global-local conflict: Can be identified and resolved by local DBMS, e.g. in DB2 ST12 and LT12

Local-local conflict: Can be identified and resolved by local DBMS, similar to traditional DBMS

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 16: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

Grid Concurrency Control Protocol Has 2 phases: submission & termination Site where transaction is submitted is called originator Split_trans(T) function is used to generate multiple sub-transactions of

global transaction Sub-transactions are then submitted to participating sites Unique timestamp is attached to each sub-transactions before

submitting Sub-transactions at local databases are executed in total-order A local schedular does not distinguishes between a local transaction

and a sub-transaction of global transaction Global transaction with only one sub-transaction does not need to be in

total-order as they cannot conflict with other global transaction at more than one site

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 17: Chapter 11 Grid Concurrency Control

GCC (Cont’d)

Submission phase of GCC

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 18: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d) Step-1) Checks if data from multiple sites need to be accessed

if data from only originator is required then treat as local transaction If more multiple DB needs to be accessed then the transaction is submitted to

metadata services. Split_trans(T) function is used to create sub-transactions

Step-2) Global transactions are added to a set which stores all the currently executing global transactions. The set name is Active_Trans

Step-3) The middleware appends a timestamp to all sub-transactions before submitting it to respective databases

Step-4) If more than one active global transaction exists simultaneously that accesses more than one database, then sub-transactions are executed in total order (according to the timestamp)

Step-5) When all sub-transactions of a global transaction finish execution then the global transaction is removed from the Active_Trans set (details in termination phase of GCC)

Note: Active_Trans is a set of currently active global transactions and Active_trans(DB) is a function that take DB site as argument and returns active transactions executing in that database

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 19: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

Termination phase of GCC A global transaction is active till even one of the sub-transaction is

executing Steps of termination are as follows:

When a sub-transaction finishes execution, the originator is informed Active Transactions, Conflicting Active Transactions and databases access by

global transaction set are updated accordingly Check whether the completed sub-transaction is the last sub-transaction of the

global transaction

if not the last, then sub-transactions waiting in the queue cannot be scheduled

if the sub-transaction is the last sub-transaction of the global transaction, then other conflicting sub-transactions can be scheduled. Sub-transactions from the queue then follows the normal submission steps

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 20: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

Termination phase of GCC

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 21: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

Revisiting the example of section 11.2 Say, transaction T1’s timestamp is 1 and T2’s timestamp is 2 History, H2, produced by site DB2 is a serial history (equation 11.5)

with T1 preceding T2 GCC will not schedule transactions as in H3 (equation 11.6) due to

step-4) of the submission phase of GCC. It will always follow the total-order based on timestamp. Hence, sub-transactions of T1 will always be scheduled before sub-transactions of T2. GCC will generate histories H2 (equation 11.5) and H3 (equation 11.6) as follows:H2 = r12(O1) r12(O2) w12(O1)C12 r22(O1) w22(O1) C22 (same as (11.5))

H3 = w13(O3) C13 r23(O3) w23(O4) C23 (corrected execution order by the GCC protocol)

Thus both schedules have ordered the transactions in total-order with T1 preceding T2

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 22: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

Comparison with traditional concurrency control protocols

Release lock request

Operation decision

Coordinator site (typically where the

transaction is submitted)

Central site managing global information (e.g.

global lock table)

All participating sites (1,2…n)

Lock request

Lock granted

Operation command

Operations of a general centralised locking protocol (e.g. centralised two phase locking) in homogeneous distributed DBMS

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 23: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

Operations of a general distributed locking protocol (e.g. decentralised two phase locking) in homogeneous distributed DBMS

Operation command embedded with lock request

Coordinator site (typically where the

transaction is submitted)

All participating sites (1,2,…n)

Participant’s image of global

information

Operation

End of operation

Release lock request

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 24: Chapter 11 Grid Concurrency Control

11.3 Grid Concurrency Control (GCC) (Cont’d)

Operations of a general Multi-DBMS protocol

MDBS Reply

Forward final decision to the originator

Final decision

Talk to participant depending on its local protocol

Operation request embedded with global information

Originator site (where the transaction is

submitted)

Multidatabase management system (global management

layer)

All participants (1,2,É n)

Check with multi-DBMS layer if required

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 25: Chapter 11 Grid Concurrency Control

Operations of GCC protocol

11.3 Grid Concurrency Control (GCC) (Cont’d)

Forward final decision to the originator Final decision

Forward operation request to participants

Operation request

Originator site (where the transaction is

submitted)

Grid Middleware services (metadata and timestamp services for this purpose)

All participants (1,2,…n)

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 26: Chapter 11 Grid Concurrency Control

11.4 Correctness of GCC Protocol Grid-serializable schedule is considered correct in Grid environment A concurrency control protocol conforming to Theorem 11.1 is Grid

serializable and thus is correct

Proposition 11.1: All local transactions and global subtransactions submitted to any local scheduler are scheduled in serializable order.

Proposition 11.2: Any two global transactions having more than one subtransaction actively executing simultaneously must follow total-order.

Based on the proposition 11.1 and 11.2 following theorem can be proved:

Theorem 11.2: Every schedule produced by GCC protocol is Grid-serializable.

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 27: Chapter 11 Grid Concurrency Control

11.5 Features of GCC Protocol Concurrency control in heterogeneous environment - Does not use

global lock table etc. and hence can work in Autonomous, Heterogeneous environment

Reducing the load from originator site - As GCC does not use a centralized scheduling schemes, originator sites have reduced load

Reducing number of messages in the inter-network - Communication between the originator and other participating sites is reduced

But due to absence of global management layer, some of the valid interleaving may not be possible and hence may result in strict schedule

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 28: Chapter 11 Grid Concurrency Control

11.6 Summary Global management layer cannot be used in Grid environment

GCC protocol maintains the correctness of data in Grid environment

GCC protocol can work in heterogeneous environment

Optimizing the scheduling process may be hard

The focus was to maintain the consistency of data in Grid databases

D. Taniar, C.H.C. Leung, W. Rahayu, S. Goel: High-Performance Parallel Database Processing and Grid Databases, John Wiley & Sons, 2008

Page 29: Chapter 11 Grid Concurrency Control

Continue to Chapter 12…