16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE...

21
Data Portals Eli Dart, Network Engineer ESnet Science Engagement Lawrence Berkeley National Laboratory GlobusWorld Chicago, IL April 11, 2017

Transcript of 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE...

Page 1: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

DataPortals

EliDart,NetworkEngineerESnetScienceEngagementLawrenceBerkeleyNationalLaboratory

GlobusWorld

Chicago,IL

April11,2017

Page 2: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

Overview

4/15/172

• ScienceDMZandDataPortals

• ThisassumesyoualreadyhaveaScienceDMZ– Ifyoudon’thaveone,wecanchatabouthowyoumightbuildone– Ifitwouldbehelpful,Icantalktoyoursystemsandnetworkingfolks– Orcheckoutthefasterdataknowledgebase:

• http://fasterdata.es.net/science-dmz/

Page 3: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

ScienceDMZDesignPattern(Abstract)

10GE

10GE

10GE

10GE

10G

Border Router

WAN

Science DMZSwitch/Router

Enterprise Border Router/Firewall

Site / CampusLAN

High performanceData Transfer Node

with high-speed storage

Per-service security policy control points

Clean, High-bandwidth

WAN path

Site / Campus access to Science

DMZ resources

perfSONAR

perfSONAR

perfSONAR

3 – ESnet Science Engagement ([email protected]) - 4/15/17 ©2015,EnergySciencesNetwork

Page 4: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

HPCCenterDataPath

©2014,EnergySciencesNetwork4 – ESnet Science Engagement ([email protected]) - 4/15/17

Routed

Border Router

WAN

Core Switch/Router

Firewall

Offices

perfSONAR

perfSONAR

perfSONAR

Supercomputer

Parallel Filesystem

Front endswitch

Data Transfer Nodes

Front endswitch

High Latency WAN Path

Low Latency LAN Path

Page 5: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

NextSteps– BuildingOnTheScienceDMZ

• Enhancedcyberinfrastructuresubstratenowexists– Wideareanetworks(ESnet,GEANT,Internet2,Regionals)– ScienceDMZsconnectedtothosenetworks– DTNsintheScienceDMZs

• Whatdoesthescientistsee?– Scientistseesascienceapplication

• Datatransfer• Dataportal• Dataanalysis

– ScienceapplicationsaretheuserinterfacetonetworksandDMZs

• Large-scaledata-intensivesciencerequiresthatwebuildlargerstructuresontopofthosecomponents

4/15/175

Page 6: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

ScienceDataPortals

• Largerepositoriesofscientificdata– Climatedata– Skysurveys(astronomy,cosmology)– Manyothers– Datasearch,browsing,access

• Manyscientificdataportalsweredesigned15+yearsago– Single-web-serverdesign– Databrowse/search,dataaccess,userawarenessallinasinglesystem– Allthedatagoesthroughtheportalserver

• Inmanycasesbydesign• E.g.embargobeforepublication(enforceaccesscontrol)

4/15/176

Page 7: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

LegacyPortalDesign

10GE

Border Router

WAN

Firewall

Enterprise

perfSONAR

perfSONAR

Filesystem(data store)

10GE

Portal Server

Browsing pathQuery pathData path

Portal server applications:· web server· search· database· authentication· data service

4/15/177

• Verydifficulttoimproveperformancewithoutarchitecturalchange– Softwarecomponentsalltangledtogether

– DifficulttoputthewholeportalinaScienceDMZbecauseofsecurity

– EvenifyoucouldputitinaDMZ,manycomponentsaren’tscalable

• Whatdoesarchitecturalchangemean?

Page 8: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

ExampleofArchitecturalChange– CDN

• Let’slookatwhatContentDeliveryNetworksdidforwebapplications

• CDNsareawell-deployeddesignpattern(Netflix,etc)• WhatdoesaCDNdo?

– Storestaticcontentinaseparatelocationfromdynamiccontent• Complexityisn’tinthestaticcontent– it’sintheapplicationdynamics• Webapplicationsarecomplex,full-featured,andslow• Dataserviceforstaticcontentissimplebycomparison

– Separationofapplicationanddataserviceallowseachtobeoptimized

4/15/178

Page 9: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

ClassicalWebServerModel

4/15/179

• Webbrowserfetchespagesfromwebserver– Allcontentstoredonthewebserver– Webapplicationsrunonthewebserver– Webserversendsdatatoclientbrowseroverthenetwork

• Perceivedclientperformancechangeswithnetworkconditions– Severalproblemsinthegeneralcase– Latencyincreasestimetopagerender– Packetloss+latencycauseproblemsforlargestaticobjects

HostingProvider

TransitNetwork

Residential BroadbandWEB

Long Distance / High Latency

Web Server

Browser

Page 10: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

Solution:PlaceLargeStaticObjectsNearClient

HostingProvider

TransitNetwork

Residential BroadbandWEB

Long Distance / High Latency

CDN

DATA

Short Distance / Low Latency

Web Server

CDN Data Server

Browser

4/15/1710

• CDNprovidesstaticcontent“close”toclient

• Webserverstillmanagescomplexbehavior

• Latencygoesdown– Timetopagerendergoesdown– Staticcontentperformancegoesup

• Loadonwebservergoesdown(noneedtoservestaticcontent)

• Significantwinforwebapplicationperformance

Page 11: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

ClientSimplySeesIncreasedPerformance

4/15/1711

• Clientdoesn’tseetheCDNasaseparatething– Webcontentisallstillviewedinabrowser

• Browserfetcheswhatthepagetellsittofetch• Differentcontentcomesfromdifferentplaces• Userdoesn’tknow/care

• CDNsprovideanarchitecturalsolutiontoaperformanceproblem– Notbrute-force– Worksmarter,notharder

The‘NetWEB

Browser

Web Server

Rich, Slow

DATA

CDN Data Server

Simple,Fast

The‘NetWEB

Browser

Web Server

Page 12: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

ArchitecturalExaminationofDataPortals

• Commondataportalfunctions(mostportalshavethese)– Search/query/discovery– Datadownloadmethodfordataaccess– GUIforbrowsingbyhumans– APIformachineaccess– ideallyincorporatessearch/query+download

• Performancepainisprimarilyinthedatahandlingpiece– Rapidincreaseindatascaleeclipsedlegacysoftwarestackcapabilities– Portalserversoftenstuckinenterprisenetwork

• Canwe“disassemble”theportalandputthepiecesbacktogetherbetter?– UseScienceDMZasaplatformforthedatapiece– AvoidplacingcomplexsoftwareintheScienceDMZ

4/15/1712

Page 13: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

LegacyPortalDesign

10GE

Border Router

WAN

Firewall

Enterprise

perfSONAR

perfSONAR

Filesystem(data store)

10GE

Portal Server

Browsing pathQuery pathData path

Portal server applications:· web server· search· database· authentication· data service

4/15/1713

Page 14: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

Next-GenerationPortalLeveragesScienceDMZ

10GE10GE

10GE

10GE

Border Router

WAN

Science DMZSwitch/Router

Firewall

Enterprise

perfSONAR

perfSONAR

10GE

10GE

10GE10GE

DTN

DTN

API DTNs(data access governed

by portal)

DTN

DTN

perfSONAR

Filesystem (data store)

10GE

Portal Server

Browsing pathQuery path

Portal server applications:· web server· search· database· authentication

Data Path

Data Transfer Path

Portal Query/Browse Path

4/15/1714

Page 15: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

PutTheDataOnDedicatedInfrastructure

• Wehaveseparatedthedatahandlingfromtheportallogic• Portalisstillitsnormalself,butenhanced

– PortalGUI,database,search,etc.allfunctionastheydidbefore– QueryreturnspointerstodataobjectsintheScienceDMZ– Portalisnowfreedfromtiestothedataservers(runitonAmazonifyouwant!)

• Datahandlingisseparate,andscalable– High-performanceDTNsintheScienceDMZ– Scaleasmuchasyouneedtowithoutmodifyingtheportalsoftware

• Outsourcedatahandlingtocomputingcenters– Computingcentersaresetupforlarge-scaledata– Letthemhandlethelarge-scaledata,andlettheportaldotheorchestrationofdataplacement

4/15/1715

Page 16: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

ScalabilityExample– PetascaleDTNProject

10.0 Gbps

17.6 Gbps

14.8 Gbps

19.3 Gbps

17.4 Gbps 17.0 Gbps

32.4 Gbps

25.3 Gbps

18.3 Gbps

16.3 Gbps

24.1 Gbps

24.0 Gbps

DTN

DTN

DTN

DTN

alcf#dtn_miraALCF

nersc#dtnNERSC

olcf#dtn_atlasOLCF

ncsa#BlueWatersNCSA

Data set: L380Files: 19260Directories: 211Other files: 0Total bytes: 4442781786482 (4.4T bytes)Smallest file: 0 bytes (0 bytes)Largest file: 11313896248 bytes (11G bytes)Size distribution:

1 - 10 bytes: 7 files10 - 100 bytes: 1 files100 - 1K bytes: 59 files1K - 10K bytes: 3170 files10K - 100K bytes: 1560 files100K - 1M bytes: 2817 files1M - 10M bytes: 3901 files10M - 100M bytes: 3800 files100M - 1G bytes: 2295 files1G - 10G bytes: 1647 files10G - 100G bytes: 3 files

March 2017L380 Data Set

4/15/1716

Page 17: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

LinksandLists

– ESnetfasterdataknowledgebase• http://fasterdata.es.net/

– ScienceDMZpaper• http://www.es.net/assets/pubs_presos/sc13sciDMZ-final.pdf

– ScienceDMZemaillist• [email protected] withsubject"subscribeesnet-sciencedmz”

– perfSONAR• http://fasterdata.es.net/performance-testing/perfsonar/• http://www.perfsonar.net

– Globus• https://www.globus.org/

17 – ESnet Science Engagement ([email protected]) - 4/15/17 ©2015,EnergySciencesNetwork

Page 18: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

Thanks!

[email protected](ESnet)LawrenceBerkeleyNationalLaboratory

http://fasterdata.es.net/

http://my.es.net/

http://www.es.net/

Page 19: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

ExtraSlides

4/15/1719

Page 20: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

DTNClusterDetail

10GE10GE

10GE10GE

10GE

10GE

Border Router

WAN

Science DMZSwitch/Router

Firewall

Enterprise

perfSONAR

perfSONAR

10GE10GE

10GE

10GE

10GE10GE

DTN

DTN

Filesystem

HEAD

“Sealed” DTNs(Globus only, no

shell access)

ClusterHead/Login

Nodes

DTN

DTN

Cluster compute nodes

HEAD

perfSONAR

Configure as DTN Cluster

4/15/1720

Page 21: 16 Science DMZ Dart - GlobusWorld · PDF fileScience DMZ Design Pattern (Abstract) 10GE GE 10GE 10GE 10G Router WAN DMZ er l Campus LAN performance e storage Per-service policy points

DTNClusterDesign

• ConfigureallfourDTNsasasingleGlobusendpoint– Globushasdocsonhowtodothis– https://support.globus.org/entries/71011547-How-do-I-add-multiple-I-O-nodes-to-a-Globus-endpoint-

• Recentoptionsforincreasedperformance– Useadditionalparallelconnections– DistributetransfersacrossmultipleDTNs(GlobusI/ONodes)– Critical– onlydothiswhenallDTNsintheendpointmountthesamesharedfilesystem

• UsetheGlobusCLIcommandendpoint-modify – Usethe--network-useoption– Adjustsconcurrencyandparallelism– Moreinfoatglobus.org (http://dev.globus.org/cli/reference/endpoint-modify/)

4/15/1721