16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE...

Post on 31-Jan-2018

218 views 1 download

Transcript of 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE...

DataPortals

EliDart,NetworkEngineerESnetScienceEngagementLawrenceBerkeleyNationalLaboratory

GlobusWorld

Chicago,IL

April11,2017

Overview

4/15/172

• ScienceDMZandDataPortals

• ThisassumesyoualreadyhaveaScienceDMZ– Ifyoudon’thaveone,wecanchatabouthowyoumightbuildone– Ifitwouldbehelpful,Icantalktoyoursystemsandnetworkingfolks– Orcheckoutthefasterdataknowledgebase:

• http://fasterdata.es.net/science-dmz/

ScienceDMZDesignPattern(Abstract)

10GE

10GE

10GE

10GE

10G

Border Router

WAN

Science DMZSwitch/Router

Enterprise Border Router/Firewall

Site / CampusLAN

High performanceData Transfer Node

with high-speed storage

Per-service security policy control points

Clean, High-bandwidth

WAN path

Site / Campus access to Science

DMZ resources

perfSONAR

perfSONAR

perfSONAR

3 – ESnet Science Engagement (engage@es.net) - 4/15/17 ©2015,EnergySciencesNetwork

HPCCenterDataPath

©2014,EnergySciencesNetwork4 – ESnet Science Engagement (engage@es.net) - 4/15/17

Routed

Border Router

WAN

Core Switch/Router

Firewall

Offices

perfSONAR

perfSONAR

perfSONAR

Supercomputer

Parallel Filesystem

Front endswitch

Data Transfer Nodes

Front endswitch

High Latency WAN Path

Low Latency LAN Path

NextSteps– BuildingOnTheScienceDMZ

• Enhancedcyberinfrastructuresubstratenowexists– Wideareanetworks(ESnet,GEANT,Internet2,Regionals)– ScienceDMZsconnectedtothosenetworks– DTNsintheScienceDMZs

• Whatdoesthescientistsee?– Scientistseesascienceapplication

• Datatransfer• Dataportal• Dataanalysis

– ScienceapplicationsaretheuserinterfacetonetworksandDMZs

• Large-scaledata-intensivesciencerequiresthatwebuildlargerstructuresontopofthosecomponents

4/15/175

ScienceDataPortals

• Largerepositoriesofscientificdata– Climatedata– Skysurveys(astronomy,cosmology)– Manyothers– Datasearch,browsing,access

• Manyscientificdataportalsweredesigned15+yearsago– Single-web-serverdesign– Databrowse/search,dataaccess,userawarenessallinasinglesystem– Allthedatagoesthroughtheportalserver

• Inmanycasesbydesign• E.g.embargobeforepublication(enforceaccesscontrol)

4/15/176

LegacyPortalDesign

10GE

Border Router

WAN

Firewall

Enterprise

perfSONAR

perfSONAR

Filesystem(data store)

10GE

Portal Server

Browsing pathQuery pathData path

Portal server applications:· web server· search· database· authentication· data service

4/15/177

• Verydifficulttoimproveperformancewithoutarchitecturalchange– Softwarecomponentsalltangledtogether

– DifficulttoputthewholeportalinaScienceDMZbecauseofsecurity

– EvenifyoucouldputitinaDMZ,manycomponentsaren’tscalable

• Whatdoesarchitecturalchangemean?

ExampleofArchitecturalChange– CDN

• Let’slookatwhatContentDeliveryNetworksdidforwebapplications

• CDNsareawell-deployeddesignpattern(Netflix,etc)• WhatdoesaCDNdo?

– Storestaticcontentinaseparatelocationfromdynamiccontent• Complexityisn’tinthestaticcontent– it’sintheapplicationdynamics• Webapplicationsarecomplex,full-featured,andslow• Dataserviceforstaticcontentissimplebycomparison

– Separationofapplicationanddataserviceallowseachtobeoptimized

4/15/178

ClassicalWebServerModel

4/15/179

• Webbrowserfetchespagesfromwebserver– Allcontentstoredonthewebserver– Webapplicationsrunonthewebserver– Webserversendsdatatoclientbrowseroverthenetwork

• Perceivedclientperformancechangeswithnetworkconditions– Severalproblemsinthegeneralcase– Latencyincreasestimetopagerender– Packetloss+latencycauseproblemsforlargestaticobjects

HostingProvider

TransitNetwork

Residential BroadbandWEB

Long Distance / High Latency

Web Server

Browser

Solution:PlaceLargeStaticObjectsNearClient

HostingProvider

TransitNetwork

Residential BroadbandWEB

Long Distance / High Latency

CDN

DATA

Short Distance / Low Latency

Web Server

CDN Data Server

Browser

4/15/1710

• CDNprovidesstaticcontent“close”toclient

• Webserverstillmanagescomplexbehavior

• Latencygoesdown– Timetopagerendergoesdown– Staticcontentperformancegoesup

• Loadonwebservergoesdown(noneedtoservestaticcontent)

• Significantwinforwebapplicationperformance

ClientSimplySeesIncreasedPerformance

4/15/1711

• Clientdoesn’tseetheCDNasaseparatething– Webcontentisallstillviewedinabrowser

• Browserfetcheswhatthepagetellsittofetch• Differentcontentcomesfromdifferentplaces• Userdoesn’tknow/care

• CDNsprovideanarchitecturalsolutiontoaperformanceproblem– Notbrute-force– Worksmarter,notharder

The‘NetWEB

Browser

Web Server

Rich, Slow

DATA

CDN Data Server

Simple,Fast

The‘NetWEB

Browser

Web Server

ArchitecturalExaminationofDataPortals

• Commondataportalfunctions(mostportalshavethese)– Search/query/discovery– Datadownloadmethodfordataaccess– GUIforbrowsingbyhumans– APIformachineaccess– ideallyincorporatessearch/query+download

• Performancepainisprimarilyinthedatahandlingpiece– Rapidincreaseindatascaleeclipsedlegacysoftwarestackcapabilities– Portalserversoftenstuckinenterprisenetwork

• Canwe“disassemble”theportalandputthepiecesbacktogetherbetter?– UseScienceDMZasaplatformforthedatapiece– AvoidplacingcomplexsoftwareintheScienceDMZ

4/15/1712

LegacyPortalDesign

10GE

Border Router

WAN

Firewall

Enterprise

perfSONAR

perfSONAR

Filesystem(data store)

10GE

Portal Server

Browsing pathQuery pathData path

Portal server applications:· web server· search· database· authentication· data service

4/15/1713

Next-GenerationPortalLeveragesScienceDMZ

10GE10GE

10GE

10GE

Border Router

WAN

Science DMZSwitch/Router

Firewall

Enterprise

perfSONAR

perfSONAR

10GE

10GE

10GE10GE

DTN

DTN

API DTNs(data access governed

by portal)

DTN

DTN

perfSONAR

Filesystem (data store)

10GE

Portal Server

Browsing pathQuery path

Portal server applications:· web server· search· database· authentication

Data Path

Data Transfer Path

Portal Query/Browse Path

4/15/1714

PutTheDataOnDedicatedInfrastructure

• Wehaveseparatedthedatahandlingfromtheportallogic• Portalisstillitsnormalself,butenhanced

– PortalGUI,database,search,etc.allfunctionastheydidbefore– QueryreturnspointerstodataobjectsintheScienceDMZ– Portalisnowfreedfromtiestothedataservers(runitonAmazonifyouwant!)

• Datahandlingisseparate,andscalable– High-performanceDTNsintheScienceDMZ– Scaleasmuchasyouneedtowithoutmodifyingtheportalsoftware

• Outsourcedatahandlingtocomputingcenters– Computingcentersaresetupforlarge-scaledata– Letthemhandlethelarge-scaledata,andlettheportaldotheorchestrationofdataplacement

4/15/1715

ScalabilityExample– PetascaleDTNProject

10.0 Gbps

17.6 Gbps

14.8 Gbps

19.3 Gbps

17.4 Gbps 17.0 Gbps

32.4 Gbps

25.3 Gbps

18.3 Gbps

16.3 Gbps

24.1 Gbps

24.0 Gbps

DTN

DTN

DTN

DTN

alcf#dtn_miraALCF

nersc#dtnNERSC

olcf#dtn_atlasOLCF

ncsa#BlueWatersNCSA

Data set: L380Files: 19260Directories: 211Other files: 0Total bytes: 4442781786482 (4.4T bytes)Smallest file: 0 bytes (0 bytes)Largest file: 11313896248 bytes (11G bytes)Size distribution:

1 - 10 bytes: 7 files10 - 100 bytes: 1 files100 - 1K bytes: 59 files1K - 10K bytes: 3170 files10K - 100K bytes: 1560 files100K - 1M bytes: 2817 files1M - 10M bytes: 3901 files10M - 100M bytes: 3800 files100M - 1G bytes: 2295 files1G - 10G bytes: 1647 files10G - 100G bytes: 3 files

March 2017L380 Data Set

4/15/1716

LinksandLists

– ESnetfasterdataknowledgebase• http://fasterdata.es.net/

– ScienceDMZpaper• http://www.es.net/assets/pubs_presos/sc13sciDMZ-final.pdf

– ScienceDMZemaillist• Sendmailtosympa@lists.lbl.gov withsubject"subscribeesnet-sciencedmz”

– perfSONAR• http://fasterdata.es.net/performance-testing/perfsonar/• http://www.perfsonar.net

– Globus• https://www.globus.org/

17 – ESnet Science Engagement (engage@es.net) - 4/15/17 ©2015,EnergySciencesNetwork

Thanks!

EliDartdart@es.netEnergySciencesNetwork(ESnet)LawrenceBerkeleyNationalLaboratory

http://fasterdata.es.net/

http://my.es.net/

http://www.es.net/

ExtraSlides

4/15/1719

DTNClusterDetail

10GE10GE

10GE10GE

10GE

10GE

Border Router

WAN

Science DMZSwitch/Router

Firewall

Enterprise

perfSONAR

perfSONAR

10GE10GE

10GE

10GE

10GE10GE

DTN

DTN

Filesystem

HEAD

“Sealed” DTNs(Globus only, no

shell access)

ClusterHead/Login

Nodes

DTN

DTN

Cluster compute nodes

HEAD

perfSONAR

Configure as DTN Cluster

4/15/1720

DTNClusterDesign

• ConfigureallfourDTNsasasingleGlobusendpoint– Globushasdocsonhowtodothis– https://support.globus.org/entries/71011547-How-do-I-add-multiple-I-O-nodes-to-a-Globus-endpoint-

• Recentoptionsforincreasedperformance– Useadditionalparallelconnections– DistributetransfersacrossmultipleDTNs(GlobusI/ONodes)– Critical– onlydothiswhenallDTNsintheendpointmountthesamesharedfilesystem

• UsetheGlobusCLIcommandendpoint-modify – Usethe--network-useoption– Adjustsconcurrencyandparallelism– Moreinfoatglobus.org (http://dev.globus.org/cli/reference/endpoint-modify/)

4/15/1721