Grid Access to Databases
Transcript of Grid Access to Databases
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Grid Database Access
Vladimir Veytser, [email protected]
NPACI Summer Institute
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Grid Database Access
• Ability to access database through the use of Grid Security Infrastructure (GSI).
• GSI is based on public key encryption, X.509 certificates, SSL communication protocol, and it is implemented by Globus ToolKit.
• Our goal is to integrate existing databases into Globus/GSI based Grids. From the user’s perspective it should be just another Grid resource.
• It should not cause degradation in performance.
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Grid Database Access Methods
• We see two methods for accessing databases on the Grid:
– Through Grid Middleware (e.g. WebServices, SRB, SpitFire)
– Direct access- i.e You can open a connection directly into a database as long as you have a “Grid” certificate.
• The access method will depend on the type of application and whether or not a particular method is supported by a platform.
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Presentation Outline
• Currently available technology• Industry efforts• Our wish list• Q&A• A small lab using DB2 and Globus
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Current Database technology
• Currently databases do not support direct Globus authentication.
• An alternative: Grid enabled middleware services– SRB: a client can GSI authenticate itself to a
server– SpitFire: a webservice in front of the database to
which client can GSI authenticate– OGSA-DAIS: data access and integration service.
This spec is based on OGSA WebServices. – Customize GRAM (part of Globus) jobmanager
which can covert RSL string into “native” SQL.
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Technology in Detail: SRB and SpitFire
• SRB: Storage Resource Broker– Client GSI authenticates to a server.– Server runs a query using native DB2 client.– Result are returned back to the client
• SpitFire (http://edg-wp2.web.cern.ch/edg-wp2/spitfire/) – Designed to give quick and easy access to
(meta)data where the access patterns are simple– Front end: Grid Web Service– Backend: JDBC to the DBMS
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Technology in Detail: OGSA-DAIS
• Working group in Global Grid Forum. Based in Edinburgh, UK
• Defines OGSI compliant webservices.– Data Resource Manager: provides handle to an
actual resource manager and exposes it’s capability/features.
– Data Resource: provides handle to an actual data source (file system, db) and exposes it’s object types (DBSchema, storedProcedure, userDefinedTypes, triggers, etc).
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
OGSA-DAIS cont.
• OGSA-DAIS Services cont:– Data Activity Session: provides the context for
data request operations. Created dynamically by DR.
– DataSet: populated by DAS. Client receives handle or data value. Handle can be:
• Synchronous: not returned until DS is created and populated
• Asynchronous: returned as soon as DS is created but before it is populated
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Access via “middle man”
• Pros– Exists today and does not require changes to the
existing databases– Does not require you to have database clients – Makes it easier for a user by automating many of
the details (transfer, staging, etc.) – Allows for DB roles (SpitFire: base, admin, info) – Dual functionality: also a good place to store meta-
data (schemas, stored procedures, etc.)– Works well in the cluster environments where you
submit batch jobs
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Access via “middle man” cont.
• Cons– Can effect performance– Admins want more flexiability and familiar interface– Store database passwords– SpitFire: designed for simple access paterns – OGSA-DAI: too heavyweight for simple data
access (4 services)– Hard to do auditing– Another thing to maintain
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Industry efforts: Oracle
• Oracle 9i– Does not have direct GSI authentication– Oracle tools can be invoked using Globus
Resource Allocation Manager (GRAM). They claim to have a toolkit that can do this (OGDK), but I could not find it.
• Oracle 10i– Will have support for GSI authentication and other
Globus services. – Should be in beta very soon
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Industry efforts: DB2
• DB2 V 8.1– Users can either write a costume jobmanager or
specify necessary parameters through the RSL string (this method will be used in today’s lab).
– Emerging Technologies Toolkit supports Grid WebServices. For more information see: http://www.alphaworks.ibm.com/tech/ettk
• DB2 V 8.2– Will be out next year– Unlike Oracle will have an interface into which you
can plug in your security model.
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Industry efforts: MySQL
• MySQL version 4.1– Closest to having GSI authentication– They support SSL (via OpenSSL library)– It should be possible to modify OpenSSL to
support Globus certificates– OpenSSH has already done it: GSISSH
(NCSA)– Currently efforts are going on at SDSC and
Brookhaven National Lab
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Our Goals
• GSI authentication pushed into the database• Role base authentication
– Admin: power users who can insert/drop/delete.– Power: read privileges and write to temp
tables/views.– Info: user with read only privileges.
• Most common. • Many to one mapping (Many data base users
to a single data base.)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Globus Jobs at TeraGridSDSC
• User submits a globus job (e.g. globus-job-submit) from his work station to tg-login.sdsc.teragrid.org
• Jobmanager at tg-login converts RSL string into PBS• PBS schedules a job on our DTF cluster (dual,
128-nodes Itanium2 cluster)• For more info: http://teragrid.org/docs/user-guide.htm• We want jobs that require DB2 access to work the
same way.– Requires DB2 client installation- not there yet.
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
LAB Outline:
Goal: As close to real world as possible• Login into BlueHorizon (NPACI grid)• Get a Grid Certificate• Create a proxy-certificate• Submit Globus DB2 job to ctf19
login/compute node (the only node with DB2 clients)
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
LAB cont.
• I will give out LAB instructions during class.• A copy of LAB instruction can be found at:
– http://www.sdsc.edu/~veytser/db2globuslab.html
SAN DIEGO SUPERCOMPUTER CENTER
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE
Thank You
• Questions?