Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs...
-
Upload
georgina-cummings -
Category
Documents
-
view
214 -
download
1
Transcript of Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs...
![Page 1: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/1.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Ultimate Integration
Joseph Lappa
Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop
![Page 2: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/2.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Agenda
• Supercomputing 2004 Conference
• Application– Ultimate Integration
• Resource Overview
• Did it work?
• What did we take from it?
![Page 3: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/3.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Supercomputing 2004
• Annual Conference– Supercomputers– Storage
• Network hardware
– Original reason for application• Bandwidth Challenge
– Didn’t apply due to time
![Page 4: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/4.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Application Requirements
• Runs on Lemieux (PSC’s supercomputer)• Application Gateways (AGW)• Cisco CRS-1
– 40Gb/sec OC-768 cards– Few exist
• Single application• Be used with another demo on the show floor
if possible
![Page 5: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/5.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Ultimate Integration Application
• Checkpoint Recovery System – Program
• Garden variety Laplace solver instrumented to save its memory state in checkpoint files
• Checkpoints memory to remote network clients• Runs on 34 Lemieux nodes
![Page 6: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/6.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Lemieux TCS System
• 750 Compaq Alphaserver ES45 nodes– SMP
• Four 1GHz Alpha Processors• 4 GB of Memory
• Interconnection– Quadrics Cluster Interconnect
• Shared memory library
![Page 7: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/7.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Application Gateways
• 750 GigE connections are very expensive• Reuse Quadrics network to attach cheap
Linux boxes with GigE– 15 AGWS
• Single processor Xeons• 1 Quadrics card• 2 Intel GigE
– Each GigE card maxes out at 990Mb/sec– Only need 30 GigE to fill link to Teragrid
• Web100 kernel
![Page 8: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/8.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Application Gateways
![Page 9: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/9.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Network
• Cisco 6509– Sup720– WS-X6748-SFP– Two WS-X6704-10GE
• Used 4 10GE interfaces• OSPF load balancing was my real worry
– >30 GE streams over 4 links
![Page 10: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/10.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Network
• Cisco CRS-1– 40 Gb/sec slot– 16 slots– For Demo
• Two OC-768 cards – Ken Goodwin’s and Kevin McGratten’s big worry was the
OC-768 transport
• Two 8 Port 10 GE cards
– Running production IOS-XR code– Had problems with tracking hardware
• Ran both without 2 Switching Fabrics with no effects on traffic
![Page 11: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/11.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Network
• Cisco CRS-1– One at Westinghouse Machine Room– One on show floor
• Fork lift needed to place it– 7 feet tall– 939 lbs empty– 1657 lbs fully loaded
![Page 12: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/12.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
The Magic Box
• Stratalight – OTS 4040 transponder “compresses” the 40Gbs signal to fit into the spectral bandwidth of a traditional 10G wave– http://www.stratalight.com/
• Uses proprietary encoding techniques • The Stratalight transponder was connected to
the Mux/DMUX of the 15454 as an alien wavelength
![Page 13: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/13.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Time Dependences
• OC-768 wasn’t worked on until one week before the conference
![Page 14: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/14.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
OC-768
![Page 15: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/15.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
OC-768
![Page 16: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/16.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
OC-768
![Page 17: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/17.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Where Does the Data Land?
• Lustre Filesystem– http://www.lustre.org/
• Developed by Cluster File Systems– http://www.clusterfs.com/
• POSIX compliant, Open Source, parallel file system
• Separates metadata and data objects to allow for speed and scaling
![Page 18: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/18.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
The Show Floor
• 8 Checkpoint Servers with a 10GigE and Infiniband connections
• 5 Lustre OSTs connected via Infiniband with 2 SCSI disk shelves (RAID5)
• Lustre meta-data server (MDS) connected via Infiniband
![Page 19: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/19.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
The Show Floor
![Page 20: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/20.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
The Demo
![Page 21: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/21.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
How well did it run?
• Laplace Solver w/ Checkpoint Recovery– Using 16 Application Gateways (32 GigE
connections): 31.1Gbs• Only 32 Lemieux nodes were available
• IPERF– Using 17 Application Gateways + 3 single GigE
attached machines: 35 Gbs
• Zero SONET errors reported on interface• Over 44TB were transferred
![Page 22: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/22.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
The Team
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
![Page 23: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/23.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Just Demoware?
• AGWs– qsub command now has AGW option
• Can do accounting (and possibly billing)• Mysql database with Web100 stats
– Validated that AGW was cost effective solution
• OC-768 Metro can be done by mere mortals
![Page 24: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/24.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Just Demoware??
• Application receiver – Laplace solver ran at PSC– Checkpoint receiver program tested / run at both
NCSA and SDSC• Ten IA64 compute nodes as receiver• ~10 Gb/sec Network to Network (/dev/null)
– 990 Mb/sec * 10 streams
![Page 25: Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.](https://reader030.fdocument.pub/reader030/viewer/2022032723/56649f575503460f94c7b553/html5/thumbnails/25.jpg)
P I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G HP I T T S B U R G H
SU PERCOMP UTI NGSU PERCOMPU TI NGSU PERCOMP UTI NGSU PERCOM PUT INGSU PERCOMP UTI NG
C E N T E RC E N T E RC E N T E RC E N T E RC E N T E R
Thank You