Grid Access to Databases

18
SAN DIEGO SUPERCOMPUTER CENTER NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Grid Database Access Vladimir Veytser, [email protected] NPACI Summer Institute

Transcript of Grid Access to Databases

Page 1: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Grid Database Access

Vladimir Veytser, [email protected]

NPACI Summer Institute

Page 2: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Grid Database Access

• Ability to access database through the use of Grid Security Infrastructure (GSI).

• GSI is based on public key encryption, X.509 certificates, SSL communication protocol, and it is implemented by Globus ToolKit.

• Our goal is to integrate existing databases into Globus/GSI based Grids. From the user’s perspective it should be just another Grid resource.

• It should not cause degradation in performance.

Page 3: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Grid Database Access Methods

• We see two methods for accessing databases on the Grid:

– Through Grid Middleware (e.g. WebServices, SRB, SpitFire)

– Direct access- i.e You can open a connection directly into a database as long as you have a “Grid” certificate.

• The access method will depend on the type of application and whether or not a particular method is supported by a platform.

Page 4: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Presentation Outline

• Currently available technology• Industry efforts• Our wish list• Q&A• A small lab using DB2 and Globus

Page 5: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Current Database technology

• Currently databases do not support direct Globus authentication.

• An alternative: Grid enabled middleware services– SRB: a client can GSI authenticate itself to a

server– SpitFire: a webservice in front of the database to

which client can GSI authenticate– OGSA-DAIS: data access and integration service.

This spec is based on OGSA WebServices. – Customize GRAM (part of Globus) jobmanager

which can covert RSL string into “native” SQL.

Page 6: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Technology in Detail: SRB and SpitFire

• SRB: Storage Resource Broker– Client GSI authenticates to a server.– Server runs a query using native DB2 client.– Result are returned back to the client

• SpitFire (http://edg-wp2.web.cern.ch/edg-wp2/spitfire/) – Designed to give quick and easy access to

(meta)data where the access patterns are simple– Front end: Grid Web Service– Backend: JDBC to the DBMS

Page 7: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Technology in Detail: OGSA-DAIS

• Working group in Global Grid Forum. Based in Edinburgh, UK

• Defines OGSI compliant webservices.– Data Resource Manager: provides handle to an

actual resource manager and exposes it’s capability/features.

– Data Resource: provides handle to an actual data source (file system, db) and exposes it’s object types (DBSchema, storedProcedure, userDefinedTypes, triggers, etc).

Page 8: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

OGSA-DAIS cont.

• OGSA-DAIS Services cont:– Data Activity Session: provides the context for

data request operations. Created dynamically by DR.

– DataSet: populated by DAS. Client receives handle or data value. Handle can be:

• Synchronous: not returned until DS is created and populated

• Asynchronous: returned as soon as DS is created but before it is populated

Page 9: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Access via “middle man”

• Pros– Exists today and does not require changes to the

existing databases– Does not require you to have database clients – Makes it easier for a user by automating many of

the details (transfer, staging, etc.) – Allows for DB roles (SpitFire: base, admin, info) – Dual functionality: also a good place to store meta-

data (schemas, stored procedures, etc.)– Works well in the cluster environments where you

submit batch jobs

Page 10: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Access via “middle man” cont.

• Cons– Can effect performance– Admins want more flexiability and familiar interface– Store database passwords– SpitFire: designed for simple access paterns – OGSA-DAI: too heavyweight for simple data

access (4 services)– Hard to do auditing– Another thing to maintain

Page 11: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Industry efforts: Oracle

• Oracle 9i– Does not have direct GSI authentication– Oracle tools can be invoked using Globus

Resource Allocation Manager (GRAM). They claim to have a toolkit that can do this (OGDK), but I could not find it.

• Oracle 10i– Will have support for GSI authentication and other

Globus services. – Should be in beta very soon

Page 12: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Industry efforts: DB2

• DB2 V 8.1– Users can either write a costume jobmanager or

specify necessary parameters through the RSL string (this method will be used in today’s lab).

– Emerging Technologies Toolkit supports Grid WebServices. For more information see: http://www.alphaworks.ibm.com/tech/ettk

• DB2 V 8.2– Will be out next year– Unlike Oracle will have an interface into which you

can plug in your security model.

Page 13: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Industry efforts: MySQL

• MySQL version 4.1– Closest to having GSI authentication– They support SSL (via OpenSSL library)– It should be possible to modify OpenSSL to

support Globus certificates– OpenSSH has already done it: GSISSH

(NCSA)– Currently efforts are going on at SDSC and

Brookhaven National Lab

Page 14: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Our Goals

• GSI authentication pushed into the database• Role base authentication

– Admin: power users who can insert/drop/delete.– Power: read privileges and write to temp

tables/views.– Info: user with read only privileges.

• Most common. • Many to one mapping (Many data base users

to a single data base.)

Page 15: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Globus Jobs at TeraGridSDSC

• User submits a globus job (e.g. globus-job-submit) from his work station to tg-login.sdsc.teragrid.org

• Jobmanager at tg-login converts RSL string into PBS• PBS schedules a job on our DTF cluster (dual,

128-nodes Itanium2 cluster)• For more info: http://teragrid.org/docs/user-guide.htm• We want jobs that require DB2 access to work the

same way.– Requires DB2 client installation- not there yet.

Page 16: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

LAB Outline:

Goal: As close to real world as possible• Login into BlueHorizon (NPACI grid)• Get a Grid Certificate• Create a proxy-certificate• Submit Globus DB2 job to ctf19

login/compute node (the only node with DB2 clients)

Page 17: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

LAB cont.

• I will give out LAB instructions during class.• A copy of LAB instruction can be found at:

– http://www.sdsc.edu/~veytser/db2globuslab.html

Page 18: Grid Access to Databases

SAN DIEGO SUPERCOMPUTER CENTER

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE

Thank You

• Questions?