Towards A Grid File System Based On A Large-Scale BLOB Management Service
-
Upload
viet-trung-tran -
Category
Technology
-
view
614 -
download
1
description
Transcript of Towards A Grid File System Based On A Large-Scale BLOB Management Service
1
TowardsaGridFileSystemBasedonaLarge-ScaleBLOBManagementService
Viet-TrungTran1,GabrielAntoniu2,BogdanNicolae3,LucBougé1,OsamuTatebe4
1ENSCachan-Brittany,France2INRIACentreRennes-Bretagne-Atlantique,France3UniversityofRennes1,France4UniversityofTsukuba,Japan
2
NewChallengesforLarge-scaleDataStorage
Scalablestoragemanagementfornew-generation,data-oriented
high-performanceapplications Massive,unstructureddataobjects(Terabytes) Manydataobjects(10³) Highconcurrency(10³concurrentclients) Fine-grainaccess(Megabytes) Large-scaledistributedplatform:largeclusters,grids,clouds,desktopgrids
Applications:distributed,withhighthroughputunderconcurrency E-scienceData-centricapplications Storageforcloudservices Map-Reduce-baseddatamining Checkpointingondesktopgrids
3
BlobSeer:aBLOB-basedApproach
DevelopedbytheKerDatateamatINRIARennes RecentlycreatedfromthePARISproject-team
Genericdata-managementplatformforhuge,unstructureddata Hugedata(TB) Highlyconcurrent,fine-grainaccess(MB):R/W/A Prototypeavailable
Keydesignfeatures Decentralizedmetadatamanagement BeyondMVCC:multiversioningexposedtotheuser Lock-freewriteaccessthroughversioning
Write-oncepages
Aback-endforhigher-level,sophisticateddatamanagementsystems Shortterm:highlyscalabledistributedfilesystems Middleterm:storageforcloudservices Longterm:extremelylargedistributeddatabases http://blobseer.gforge.inria.fr/
BLOB
4
BlobSeer:Design
Eachblobisfragmentedintoequally-sized“pages” Allowshugedataamountstobedistributedalloverthepeers Avoidscontentionforsimultaneousaccessestodisjointpartsofthedatablock
Metadata:locatepagesthatmakeupagivenblob Fine-grainedanddistributed EfficientlymanagedthroughaDHT
Versioning Update/append:generatenewpagesratherthanoverwrite Metadataisextendedtoincorporatetheupdate Boththeoldandthenewversionoftheblobareaccessible
http://blobseer.gforge.inria.fr
5
Clients
Providers
Metadata providers
Provider manager
Version manager
BlobSeer:Architecture
Clients Performfinegrainblobaccesses
Providers Storethepagesoftheblob
Providermanager Monitorstheproviders Favorsdataloadbalancing
Metadataproviders Storeinformationaboutpagelocation
Versionmanager Ensuresconcurrencycontrol
6
VersionManagementinBlobSeer
Beyond MultiversionConcurrency
Control
7
Background:Object-basedFileSystemsFromblock-basedtoobject-basedfilesystems
Blockmanagementatthefileserver Objectmanagementisdelegatedtoobject-basedstoragedevices(OSDs)orstorageservers
Advantages Scalability:offload~90%ofworkloadfromthemetadataserver OSDsandstorageserversaremoreautonomous,self-managedandeasiertoshare
Examplesofobject-basedfilesystems Lustre(Schwan,2003) Ceph(Weiletal.,2006) GoogleFS(Ghemawatetal.,2003) XtreemFS(Hupfeldetal.,2008)
8
TowardsaBLOB-basedFileSystem
Goal:BuildaBLOB-basedfilesystem,abletocopewithhugedataandheavyaccessconcurrencyinalarge-scaledistributedenvironment
Hierarchicalapproach High-levelfilesystemmetadatamanagement:theGfarmgridfilesystem Low-levelobjectmanagement:theBlobSeerBLOBmanagementsystem
BlobSeer
Gfarm
9
gfmd:Gfarmmanagementdaemongfsd:Gfarmstoragedaemon
TheGfarmGridFileSystem
TheGfarmfilesystem[UniversityofTsukuba,Japan]
AdistributedfilesystemdesignedforworkingattheGridscale
Filecanbesharedamongallnodesandclients
Applicationscanaccessfilesusingthesamepathregardlessofthefilelocation
Maincomponents Gfarm'smetadataserver Filesystemnodes Gfarmclients
10
TheGfarmGridFileSystem[2]
Advancedfeatures Usermanagement Authenticationandsinglesign-onbasedonGridSecurityInfrastructure(GSI) POSIXfilesystemAPI(gfarm2fs)andGfarmAPI
Limitations Nofilestriping,thusfilesizeislimited Noaccessconcurrency Noversioningcapability
11
WhycombineGfarmandBlobSeer?
LackofPOSIXfilesysteminterface
AccessconcurrencyFine-grainaccessVersioning
BlobSeer
POSIXinterfaceUsermanagementSupportGSI
FilesizesarelimitedNotsuitableforconcurrentaccessNoversioning
Gfarm
AccessconcurrencyHugefilesizesFine-grainaccessVersioning
Gfarm/BlobSeer
POSIXinterfaceUsermanagementSupportGSI
Generalidea:Gfarmhandlesfilemetadata,BlobSeerhandlesfiledata
12
Thefirstapproach Eachfilesystemnode(gfsd)connectstoBlobSeertostore/getGfarmfiledata
ItmustmanagethemappingfromGfarmfilestoBLOBs
Italwaysactsasanintermediaryfordatatransfer
Bottleneckatthefilesystemnode
CouplingGfarmandBlobSeer[1]
Gfarm
BlobSeer
1
2
3
4
13
Thefirstapproach Eachfilesystemnode(gfsd)connectstoBlobSeertostore/getGfarmfiledata
ItmustmanagethemappingfromGfarmfilestoBLOBs
Italwaysactsasanintermediaryfordatatransfer
Bottleneckatthefilesystemnode
CouplingGfarmandBlobSeer[1]
Gfarm
BlobSeer
1
2
3
4
14
CouplingGfarmandBlobSeer[2]
Secondapproach ThegfsdmapsGfarmfilestoBLOBs,andrespondstheclient'srequestwithintheBLOBID
Then,theclientdirectlyaccessdatainBlobSeer
Gfarm
1
2
3
4
5
15
Gfarm/BlobSeer:DesignConsiderations
Thewholesystemisdimensionedforamulti-siteGrid OneBlobSeerinstanceateachsite
Theclientcandynamicallyswitchbetweenthesetwowelldefinedaccessmodes
Remoteaccessmode(redlines):iftheclientcannotdirectlyaccessBlobSeer BlobSeerdirectaccessmode(bluelines):theclientscandirectlyaccessBlobSeer
16
Discussion
TheintegratedsystemcansupporthugefilesizessinceGfarmfileisnowtransparentlystripedoverBlobSeerdataproviders.
Remoteaccessmode Onlyrequiredmodifyingthesourcecodeofthegfsddaemon Couldnotresolvethecurrentbottleneckproblemonthegfsddaemon
BlobSeerdirectaccessmode Supportsconcurrentaccesses Wasmorecomplicatedinimplementation,anewaccessmodehadtobeintroducedintoGfarm
17
ExperimentalEvaluationonGrid'5000[1]
Accessthroughputwithnoconcurrency
Gfarmbenchmarkmeasuresthroughputwhenwriting/reading
Configuration:1gfmd,1gfsd,1client,9dataproviders,8MBpagesize
BlobSeerdirectaccessmodeprovidesahigherthroughput
Writing Reading
18
ExperimentalEvaluationonGrid'5000[2]
Accessthroughputunderconcurrency
Configuration 1gfmd 1gfsd 24dataproviders Eachclientaccesses1GBofa10GBfile Pagesize8MB
Gfarmsequentializesconcurrentaccesses
19
ExperimentalevaluationonGrid'5000[3]
Accessthroughputunderheavyconcurrency
Configuration(deployedon157nodesofRennessite)
1gfmd 1gfsd Eachclientaccesses1GBofa64GBfile Pagesize8MB Upto64concurrentclients 64dataproviders 24metadataproviders 1versionmanager 1pagemanager
20
IntroductingVersioninginGfarm/BlobSeer
Versioningisthecapabilityofaccessingdataofaspecifiedfileversion
Notonlytorollbackdatawhendesired,butalsotoaccessdifferentfileversionswithinthesamecomputation
Favorsefficientaccessconcurrency
Approach DelegateversioningmanagementtoBlobSeer AGfarmfileismappedtoasingleBLOB AfileversionismappedtothecorrespondingversionoftheBLOB
21
Difficulties
Gfarmdoesnotsupportversioning
Someissueswehadtodealwith:
NewextendedAPIforclients
AcoordinationbetweenclientsandgfsddaemonsvianewRPCcalls
ModificationoftheinnerdatastructuresofGfarminordertohandlefileversions
VersioningisnotastandardfeatureofPOSIXfilesystems
22
Versioninginterface
Versioningcapabilitywasfullyimplemented
AtGfarmAPIlevel gfs_get_current_version(GFS_Filegf,size_t*nversion) gfs_get_latest_version(GFS_Filegf,size_t*nversion) gfs_set_version(GFS_Filegf,size_tversion) gfs_pio_vread(size_tnversion,GFS_Filegf,void*buffer,intsize,int*np)
AtPOSIXfilesystemlevel Definedsomeioctlcommands
fd = open(argv[1], 0_RDWR);
np = pwrite(fd, buffer_w, BUFFER_SIZE,0);ioctl(fd, BLOB_GET_LATEST_VERSION, &nversion);
ioctl(fd, BLOB_SET_ACTIVE_VERSION, &nversion);np = pread(fd, buffer_r, BUFFER_SIZE,0);
ioctl(fd, BLOB_GET_ACTIVE_VERSION, &nversion);close(fd);
23
Conclusion
TheexperimentalresultssuggestthatourprototypewellexploitedthecombinedadvantagesofGfarmandBlobSeer.
AfilesystemAPI Concurrentaccesses Hugefilesizes(testedupto64GBfile) Versioning
Futurework ExperimentthesystemonamorecomplextopologyasseveralsitesofGrid'5000,andwithaWANnetwork
EnsurethesemanticofconsistencysinceGfarmhasnotyetmaintainedcachecoherencebetweenbuffersonclients
CompareourprototypetootherGridfilesystems
24
Applicationscenarios
Gridandcloudstoragefordata-miningapplicationswithmassivedata
Distributedstorageforlarge-scalePetascalecomputingapplications
Storagefordesktopgridapplicationswithhighwrite-throughputrequirements
Storagesupportforextremelylargedatabases