Towards A Grid File System Based On A Large-Scale BLOB Management Service

24
1 Towards a Grid File System Based on a Large-Scale BLOB Management Service Viet-Trung Tran 1 , Gabriel Antoniu 2 , Bogdan Nicolae 3 , Luc Bougé 1 , Osamu Tatebe 4 1 ENS Cachan - Brittany, France 2 INRIA Centre Rennes - Bretagne-Atlantique, France 3 University of Rennes 1, France 4 University of Tsukuba, Japan

description

This paper addresses the problem of building a grid file system for applications that need to manipulate huge data, distributed and concurrently accessed at a very large scale. In this paper we explore how this goal could be reached through a cooperation between the Gfarm grid file system and BlobSeer, a distributed object management system specifically designed for huge data management under heavy concurrency. The resulting BLOB-based grid file system exhibits scalable file access performance in scenarios where huge files are subject to massive, concurrent, fine-grain accesses. This is demonstrated through preliminary experiments of our prototype, conducted on the Grid'5000 testbed.

Transcript of Towards A Grid File System Based On A Large-Scale BLOB Management Service

Page 1: Towards A Grid File System Based On A Large-Scale BLOB Management Service

1

TowardsaGridFileSystemBasedonaLarge-ScaleBLOBManagementService

Viet-TrungTran1,GabrielAntoniu2,BogdanNicolae3,LucBougé1,OsamuTatebe4

1ENSCachan-Brittany,France2INRIACentreRennes-Bretagne-Atlantique,France3UniversityofRennes1,France4UniversityofTsukuba,Japan

Page 2: Towards A Grid File System Based On A Large-Scale BLOB Management Service

2

NewChallengesforLarge-scaleDataStorage

Scalablestoragemanagementfornew-generation,data-oriented

high-performanceapplications Massive,unstructureddataobjects(Terabytes) Manydataobjects(10³) Highconcurrency(10³concurrentclients) Fine-grainaccess(Megabytes) Large-scaledistributedplatform:largeclusters,grids,clouds,desktopgrids

Applications:distributed,withhighthroughputunderconcurrency E-scienceData-centricapplications Storageforcloudservices Map-Reduce-baseddatamining Checkpointingondesktopgrids

Page 3: Towards A Grid File System Based On A Large-Scale BLOB Management Service

3

BlobSeer:aBLOB-basedApproach

DevelopedbytheKerDatateamatINRIARennes RecentlycreatedfromthePARISproject-team

Genericdata-managementplatformforhuge,unstructureddata Hugedata(TB) Highlyconcurrent,fine-grainaccess(MB):R/W/A Prototypeavailable

Keydesignfeatures Decentralizedmetadatamanagement BeyondMVCC:multiversioningexposedtotheuser Lock-freewriteaccessthroughversioning

Write-oncepages

Aback-endforhigher-level,sophisticateddatamanagementsystems Shortterm:highlyscalabledistributedfilesystems Middleterm:storageforcloudservices Longterm:extremelylargedistributeddatabases http://blobseer.gforge.inria.fr/

BLOB

Page 4: Towards A Grid File System Based On A Large-Scale BLOB Management Service

4

BlobSeer:Design

Eachblobisfragmentedintoequally-sized“pages” Allowshugedataamountstobedistributedalloverthepeers Avoidscontentionforsimultaneousaccessestodisjointpartsofthedatablock

Metadata:locatepagesthatmakeupagivenblob Fine-grainedanddistributed EfficientlymanagedthroughaDHT

Versioning Update/append:generatenewpagesratherthanoverwrite Metadataisextendedtoincorporatetheupdate Boththeoldandthenewversionoftheblobareaccessible

http://blobseer.gforge.inria.fr

Page 5: Towards A Grid File System Based On A Large-Scale BLOB Management Service

5

Clients

Providers

Metadata providers

Provider manager

Version manager

BlobSeer:Architecture

Clients Performfinegrainblobaccesses

Providers Storethepagesoftheblob

Providermanager Monitorstheproviders Favorsdataloadbalancing

Metadataproviders Storeinformationaboutpagelocation

Versionmanager Ensuresconcurrencycontrol

Page 6: Towards A Grid File System Based On A Large-Scale BLOB Management Service

6

VersionManagementinBlobSeer

Beyond MultiversionConcurrency

Control

Page 7: Towards A Grid File System Based On A Large-Scale BLOB Management Service

7

Background:Object-basedFileSystemsFromblock-basedtoobject-basedfilesystems

Blockmanagementatthefileserver Objectmanagementisdelegatedtoobject-basedstoragedevices(OSDs)orstorageservers

Advantages Scalability:offload~90%ofworkloadfromthemetadataserver OSDsandstorageserversaremoreautonomous,self-managedandeasiertoshare

Examplesofobject-basedfilesystems Lustre(Schwan,2003) Ceph(Weiletal.,2006) GoogleFS(Ghemawatetal.,2003) XtreemFS(Hupfeldetal.,2008)

Page 8: Towards A Grid File System Based On A Large-Scale BLOB Management Service

8

TowardsaBLOB-basedFileSystem

Goal:BuildaBLOB-basedfilesystem,abletocopewithhugedataandheavyaccessconcurrencyinalarge-scaledistributedenvironment

Hierarchicalapproach High-levelfilesystemmetadatamanagement:theGfarmgridfilesystem Low-levelobjectmanagement:theBlobSeerBLOBmanagementsystem

BlobSeer

Gfarm

Page 9: Towards A Grid File System Based On A Large-Scale BLOB Management Service

9

gfmd:Gfarmmanagementdaemongfsd:Gfarmstoragedaemon

TheGfarmGridFileSystem

TheGfarmfilesystem[UniversityofTsukuba,Japan]

AdistributedfilesystemdesignedforworkingattheGridscale

Filecanbesharedamongallnodesandclients

Applicationscanaccessfilesusingthesamepathregardlessofthefilelocation

Maincomponents Gfarm'smetadataserver Filesystemnodes Gfarmclients

Page 10: Towards A Grid File System Based On A Large-Scale BLOB Management Service

10

TheGfarmGridFileSystem[2]

Advancedfeatures Usermanagement Authenticationandsinglesign-onbasedonGridSecurityInfrastructure(GSI) POSIXfilesystemAPI(gfarm2fs)andGfarmAPI

Limitations Nofilestriping,thusfilesizeislimited Noaccessconcurrency Noversioningcapability

Page 11: Towards A Grid File System Based On A Large-Scale BLOB Management Service

11

WhycombineGfarmandBlobSeer?

LackofPOSIXfilesysteminterface

AccessconcurrencyFine-grainaccessVersioning

BlobSeer

POSIXinterfaceUsermanagementSupportGSI

FilesizesarelimitedNotsuitableforconcurrentaccessNoversioning

Gfarm

AccessconcurrencyHugefilesizesFine-grainaccessVersioning

Gfarm/BlobSeer

POSIXinterfaceUsermanagementSupportGSI

Generalidea:Gfarmhandlesfilemetadata,BlobSeerhandlesfiledata

Page 12: Towards A Grid File System Based On A Large-Scale BLOB Management Service

12

Thefirstapproach Eachfilesystemnode(gfsd)connectstoBlobSeertostore/getGfarmfiledata

ItmustmanagethemappingfromGfarmfilestoBLOBs

Italwaysactsasanintermediaryfordatatransfer

Bottleneckatthefilesystemnode

CouplingGfarmandBlobSeer[1]

Gfarm

BlobSeer

1

2

3

4

Page 13: Towards A Grid File System Based On A Large-Scale BLOB Management Service

13

Thefirstapproach Eachfilesystemnode(gfsd)connectstoBlobSeertostore/getGfarmfiledata

ItmustmanagethemappingfromGfarmfilestoBLOBs

Italwaysactsasanintermediaryfordatatransfer

Bottleneckatthefilesystemnode

CouplingGfarmandBlobSeer[1]

Gfarm

BlobSeer

1

2

3

4

Page 14: Towards A Grid File System Based On A Large-Scale BLOB Management Service

14

CouplingGfarmandBlobSeer[2]

Secondapproach ThegfsdmapsGfarmfilestoBLOBs,andrespondstheclient'srequestwithintheBLOBID

Then,theclientdirectlyaccessdatainBlobSeer

Gfarm

1

2

3

4

5

Page 15: Towards A Grid File System Based On A Large-Scale BLOB Management Service

15

Gfarm/BlobSeer:DesignConsiderations

Thewholesystemisdimensionedforamulti-siteGrid OneBlobSeerinstanceateachsite

Theclientcandynamicallyswitchbetweenthesetwowelldefinedaccessmodes

Remoteaccessmode(redlines):iftheclientcannotdirectlyaccessBlobSeer BlobSeerdirectaccessmode(bluelines):theclientscandirectlyaccessBlobSeer

Page 16: Towards A Grid File System Based On A Large-Scale BLOB Management Service

16

Discussion

TheintegratedsystemcansupporthugefilesizessinceGfarmfileisnowtransparentlystripedoverBlobSeerdataproviders.

Remoteaccessmode Onlyrequiredmodifyingthesourcecodeofthegfsddaemon Couldnotresolvethecurrentbottleneckproblemonthegfsddaemon

BlobSeerdirectaccessmode Supportsconcurrentaccesses Wasmorecomplicatedinimplementation,anewaccessmodehadtobeintroducedintoGfarm

Page 17: Towards A Grid File System Based On A Large-Scale BLOB Management Service

17

ExperimentalEvaluationonGrid'5000[1]

Accessthroughputwithnoconcurrency

Gfarmbenchmarkmeasuresthroughputwhenwriting/reading

Configuration:1gfmd,1gfsd,1client,9dataproviders,8MBpagesize

BlobSeerdirectaccessmodeprovidesahigherthroughput

Writing Reading

Page 18: Towards A Grid File System Based On A Large-Scale BLOB Management Service

18

ExperimentalEvaluationonGrid'5000[2]

Accessthroughputunderconcurrency

Configuration 1gfmd 1gfsd 24dataproviders Eachclientaccesses1GBofa10GBfile Pagesize8MB

Gfarmsequentializesconcurrentaccesses

Page 19: Towards A Grid File System Based On A Large-Scale BLOB Management Service

19

ExperimentalevaluationonGrid'5000[3]

Accessthroughputunderheavyconcurrency

Configuration(deployedon157nodesofRennessite)

1gfmd 1gfsd Eachclientaccesses1GBofa64GBfile Pagesize8MB Upto64concurrentclients 64dataproviders 24metadataproviders 1versionmanager 1pagemanager

Page 20: Towards A Grid File System Based On A Large-Scale BLOB Management Service

20

IntroductingVersioninginGfarm/BlobSeer

Versioningisthecapabilityofaccessingdataofaspecifiedfileversion

Notonlytorollbackdatawhendesired,butalsotoaccessdifferentfileversionswithinthesamecomputation

Favorsefficientaccessconcurrency

Approach DelegateversioningmanagementtoBlobSeer AGfarmfileismappedtoasingleBLOB AfileversionismappedtothecorrespondingversionoftheBLOB

Page 21: Towards A Grid File System Based On A Large-Scale BLOB Management Service

21

Difficulties

Gfarmdoesnotsupportversioning

Someissueswehadtodealwith:

NewextendedAPIforclients

AcoordinationbetweenclientsandgfsddaemonsvianewRPCcalls

ModificationoftheinnerdatastructuresofGfarminordertohandlefileversions

VersioningisnotastandardfeatureofPOSIXfilesystems

Page 22: Towards A Grid File System Based On A Large-Scale BLOB Management Service

22

Versioninginterface

Versioningcapabilitywasfullyimplemented

AtGfarmAPIlevel gfs_get_current_version(GFS_Filegf,size_t*nversion) ‏ gfs_get_latest_version(GFS_Filegf,size_t*nversion) ‏ gfs_set_version(GFS_Filegf,size_tversion) ‏ gfs_pio_vread(size_tnversion,GFS_Filegf,void*buffer,intsize,int*np) ‏

AtPOSIXfilesystemlevel Definedsomeioctlcommands

fd = open(argv[1], 0_RDWR);

np = pwrite(fd, buffer_w, BUFFER_SIZE,0);ioctl(fd, BLOB_GET_LATEST_VERSION, &nversion);

ioctl(fd, BLOB_SET_ACTIVE_VERSION, &nversion);np = pread(fd, buffer_r, BUFFER_SIZE,0);

ioctl(fd, BLOB_GET_ACTIVE_VERSION, &nversion);close(fd);

Page 23: Towards A Grid File System Based On A Large-Scale BLOB Management Service

23

Conclusion

TheexperimentalresultssuggestthatourprototypewellexploitedthecombinedadvantagesofGfarmandBlobSeer.

AfilesystemAPI Concurrentaccesses Hugefilesizes(testedupto64GBfile) Versioning

Futurework ExperimentthesystemonamorecomplextopologyasseveralsitesofGrid'5000,andwithaWANnetwork

EnsurethesemanticofconsistencysinceGfarmhasnotyetmaintainedcachecoherencebetweenbuffersonclients

CompareourprototypetootherGridfilesystems

Page 24: Towards A Grid File System Based On A Large-Scale BLOB Management Service

24

Applicationscenarios

Gridandcloudstoragefordata-miningapplicationswithmassivedata

Distributedstorageforlarge-scalePetascalecomputingapplications

Storagefordesktopgridapplicationswithhighwrite-throughputrequirements

Storagesupportforextremelylargedatabases