High-Performance Computing at ZIH...July 2010 Zentrum für Informationsdienste und...
Transcript of High-Performance Computing at ZIH...July 2010 Zentrum für Informationsdienste und...
High-Performance
Computing
at ZIH
July
2010
Zentrum für Informationsdienste und Hochleistungsrechnen (ZIH)
Dr. Stefanie Maletti
Stefanie Maletti - 2
Dresden University of Technology
Founded in 1828one of the oldest technical universities in Germany14 faculties and a number of specialized institutes35.000 students, 4200 permanent employees, 419 professors among themInternational courses of studies, batchelor, mastersLargest computer science faculty in Germany150 million Euro annual third party fundinghttp://www.tu-dresden.de
Stefanie Maletti - 3
Center for Information Services and HPC (ZIH)
•
Central Scientific Unit at TU Dresden
•
Competence Center for „Parallel Computing and Software Tools“
•
Center for Data-intensive Computing
•
Strong commitment to support real users
•
Development of algorithms and methods: Cooperation with users from all departments
Stefanie Maletti - 4
Structure
of ZIH
Department IAKInterdisciplinary Function Support andCoordinationDr. M. Müller
Department NKNetworking and Communication Services
W. Wünsch
Department ZSDCentral Systems and Services
Dr. S. Maletti
Department IMCInnovative Methods of Computing
PD Dr. A. Deutsch
Department PSWProgramming and Software Tool-Kits
Dr. H. Mix
ManagementDirector:
Prof. Dr. W. E. NagelDeputy. Directors: Dr. P. Fischer
Dr. M. Müller
Department VDRDistributed and Data-intensive Computing
Dr. Müller-Pfefferkorn
Stefanie Maletti - 5
Central Services
ZIH operates–
central
systems
and services
of the
TU Dresden
–
central
HPC resources
for
Saxony–
D-Grid
resources
for
Germany
Stefanie Maletti - 6
Responsibilities of ZIH
•
Providing infrastructure and qualified service for TU Dresden and Saxony
•
Research topicsArchitecture and performance analysis of High Performance ComputersProgramming methods and techniques for HPC systemsSoftware tools to support programming and optimizationModeling algorithms of biological processesMathematical models, algorithms, and efficient implementations
•
Role of mediator between vendors, developers, and users
•
Pick up and preparation of new concepts, methods, and techniques
•
Teaching and Education
Stefanie Maletti - 7
Overall Infrastructure –
ZIH at a glance
Stefanie Maletti - 8
Backup
FC Switche2 x 32 Ports
Backup Server
Tape Library
IBM DS 4300 Turbo Kapazität ca. 2 x 35 TB (SATA) 2 x 2 Controller á 2 x FC 2 x 7 Einschübe je (13+P) á 400 GB
2 x 1 Einschub (12+P+S) á 300 GB FC (3,6 TB Metadaten)
SAN 87 TB
Gb Ethernet
FC
Server und PCs im Campus 600 Lizenzen für Backup-Clients 400 Lizenzen für Notebooks 150 Datenbankanwendungen 20 Mailserver
2x IBM 3584 Ultrium LTO3 20 Kapazität / Medium [GB] 400 Medien 4000Total 1,6 PB
13 IBM-Server 2x IBM x366 3x IBM x336 8x IBM x3550
Stefanie Maletti - 9
File Service at ZIH
FC Switche2x 32 Ports
2x File-Server als HACMP-ClusterIBM pSeries p570
Plattensystem 60 TB
6x GbE
8x Power5+32 GB RAM4x146 GB Systemplatten4x FC 4 Gbit/s8x Gbit/s Ethernet
IBM DS 4800 64x 500 GB SATA 96x 300 GB FC
Plattensystem 36 TBSUN STK 6540 32x 1TB SATA 16x 300 GB FC
4x NFS-ServerSUN x4140 bzw. x4200
Stefanie Maletti - 10
Network Infrastructure
Connection
to the
German Research Network
and the
Internet with
a bandwidth
of 3x 10 Gbit/s
Stefanie Maletti - 11
Wireless LAN
Currently 532 access points, extension in progress
Stefanie Maletti - 12
High-Performance
Computing
HPC resources
/PC Cluster–
HRSK (HPC/Storage
Complex)
–
NEC SX6–
Windows HPC Cluster
Stefanie Maletti - 13
HRSK -
Arguments and Goals
Significant extension of Compute Capacity for all scientific areas at Technical University DresdenSo far: Many procurements in the HPC area motivated by numerical
performance Special emphasis in Dresden: Data Intensive ComputingGoal of the procurement: bandwidth-oriented, flexible, heterogeneous Budget (HBFG procedure): 15 Mio. EuroPlans and discussions: 2002/2003, Proposal: February 2004, Approval: November 2004, Procurement: Spring 2005Final decision: June 2005, Delivery: In 2 phases, starting October 2005System to support all scientists in Saxony Extended machine room prerequisite
Stefanie Maletti - 14
HRSK Project
HPC-SAN
Festplatten-kapazität: > 50 TB
HPC-Komponente
Hauptspeicher: ≥ 4 TB
8 GB/s
PC-SAN
Festplatten-kapazität: > 50 TB
4 GB/s
-PC Farm
4 GB/s
PetaByte-Bandarchiv
Kapazität: ≥
1 PB
1,8 GB/s
HPC/Storage
Complex
(HRSK)
Stefanie Maletti - 15
HPC/Storage
Complex
(HRSK)
--
HPC-SAN
Festplatten-kapazität:
68 TB
PC-SAN
Festplatten-kapazität:
68 TB
PetaByte-Bandarchiv
Kapazität: 1 PB
8 GB/s 4 GB/s4 GB/s
1,8 GB/s
2048 Intel Montecito
CoresHPC-Server: SGI Altix 4700
Hauptspeicher: 6,5 TB
Linux Networx PC-Farm2592 AMD Opteron
Cores
Hauptspeicher: 5,5 TB
Stefanie Maletti - 16
Main Contractor: SGI
HPC-component•2048 CPU cores
(Intel Itanium
2, Montecito)•6,5 TB Main memory
with
high bandwidth
(major
decision
criteria)
SAN for
HPC / PC•68 / 51 TB Capacity
Tape Silo•1 PB Capacity
PC-Farm•> 700 System boards
(AMD Opteron)
•Infiniband
network
build
from
288-port switches
Stefanie Maletti - 17
LNXI PC Farm - Phobos64 nodes with 2 single core CPUs AMD Opteron
(2.2 GHz)
2005: HRSK –
Phase 1
SGI Altix 3700 - Merkur192 CPUs Intel Itanium2 (1.5 GHz)
Stefanie Maletti - 18
SGI Altix 4700 (mars, jupiter, saturn, uranus, neptun)1024 sockets with Itanium2 Montecito dual core CPUs (1.6 GHz/9MB
L3 cache)
2006: HRSK –
Phase 2
LNXI PC-Farm DeimosAMD Opteron
X85 dual core chip with 2,6 GHz 384x single CPU nodes232x dual CPU nodes112x quad CPU nodes
Stefanie Maletti - 19
Top500 ListNovember 2006:
–
Place
49: Altix
4700 (11,91 TFlops)–
Place
106: Deimos
(6,2 TFlops)
June
2007:–
Place
73: Altix
4700 (11,91 TFlops)–
Place
79: Deimos
(10,88 TFlops)
June
2008:–
Place
289: Altix–
Place
333: Deimos
HRSK -
Computing
Performance
Stefanie Maletti - 20
New Building: Animation
Stefanie Maletti - 21
HRSK Construction Site Trefftz-Bau
(1)
Stefanie Maletti - 22
HRSK Construction Site Trefftz-Bau
(2)
May 2006
Stefanie Maletti - 23
July, 10th, 2006New building
is
readyStart of HRSK installations
phase
2
HRSK New Building
Stefanie Maletti - 24
HRSK Installation Phase 2 –
PC Farm in July
2006Linux Networx
PC FarmAMD Opteron
X85 dual core CPUs (2.6 GHz)384x single socket nodes232x dual socket nodes112x quad socket nodes2592 cores in 1296 sockets overall13 TFlops/s
peak performance 5,5 TB memory (2 GB memory ECC per core)2 Infiniband
networks (communication and I/O)68 TB disks (PC SAN)SuSE
SLES 10Batchsystem
LSF Pathscale
compilerIntel C++/C und Fortran compiler, MKLAlinea
DDT debugger Vampir
deimos.hrsk.tu-dresden.de
Stefanie Maletti - 25
HRSK Installation Phase 2 -
Altix
in September 2006
SGI Altix
4700 5 partitions (mars, uranus, saturn, jupiter, neptun)1024 sockets with Itanium2 (Montecito) dual core CPUs (1.6 GHz/9 MB L3 cache)13 TFlops/s
peak performance6,5 TB memory (2 GB/CPU), NUMA 68 TB disks (HPC SAN)SuSE
SLES 10 incl. SGI ProPack
4Batchsystem
LSFIntel C++/C and Fortran compiler, MKL, Vtune, Trace Collector, Trace AnalyzerAlinea
DDT debugger Vampir
mars.hrsk.tu-dresden.de
Stefanie Maletti - 26
HRSK Storage
Area
Network
in September 2006 HPC SAN and PC SAN
SGI InfiniteStorage
6700 (DDN S2A9500)1280 FC disks8 RAID controller pairs (4 x 4 Gbit/s)8 GByte/s
bandwidth to the HPC Server4 GByte/s
bandwidth to the PC FarmHPC SAN: 68 TByte
capacityPC SAN: 68 TByte
capacity
Stefanie Maletti - 27
HRSK Installation Phase 2 –
Archive in August 2006
SUN STK SL85002500 slots30 LTO-3 drives2500 LTO-3 tapes8 robots
Petabyte
Tape Archive
Stefanie Maletti - 28
HRSK Operational Concept
-
Security
HRSK firewallUser login
is
possible
on:
mars.hrsk.tu-dresden.dedeimos.hrsk.tu-dresden.de
Access only
from
*.tu-dresden.de
Stefanie Maletti - 29
HRSK Operational Concept
–
Altix
4700
HPC component
–
Altix
4700Seperation
of system
and users
by
means
of boot CPU sets
–
Preparation
system
mars: 4 CPUs -
system, 32 CPUs -
login, 348 CPUs -
production
–
Production
system
jupiter: 4 CPUs -
system, 508 CPUs -
production–
Production
system
uranus: 4 CPUs -
system, 508 CPUs -
production
–
Production
system
saturn: 4 CPUs -
system, 508 CPUs -
production–
Graphics and interactive
system
neptun: 4 CPUs -
system, 128 CPUs -
graphics
und interactive
programmingProduction
under
control
of batch
system
LSF
–
Exclusive
usage
of CPUs by
means
of LSF CPU setsLocal
disks
as temporary
storage
Cluster file
system
CXFS
Stefanie Maletti - 30
HRSK Operational Concept
–
PC Farm
PC Farm2 master
nodes
as system
nodes
4 dual CPU nodes
as login
nodes
(deimos101, deimos102, deimos103, deimos104 )All other
nodes
(single, dual, quads) are
production
nodes
Production
under
control
of batch
system
LSFLocal
disks
as temporary
storage
Cluster file
system
Lustre
Stefanie Maletti - 31
HRSK File SystemsUser file
systems
on mars
HPC-SAN–
CXFS:
/fastfs – 60 TByte, 56 tiers (RAID6 8+2): Working space (temporary)/work – 8 TByte, 8 tiers (RAID6 8+2): Home and software
User file
systems
on deimosPC-SAN
–
Lustre:
/fastfs – 50 TByte, 48 tiers (RAID6 8+2): Working space (temporary)/work – 17 TByte, 16 tiers (RAID6 8+2): Home and software
HPC-SAN –
Access to CXFS via NFS server:
/hpc_fastfs/hpc_work
Stefanie Maletti - 32
Inauguration
Inauguration April, 2nd, 2007
Stefanie Maletti - 33
Inauguration
Start of operation
Stefanie Maletti - 34
Inauguration
Guided
tour with
the
leader
of Saxony
Prof. Dr. Georg Milbradt
Stefanie Maletti - 35
Outside View at Night in 2006
Foto: Thomas D. Wurzel, Dresdner Campus Zeitung
Ausgabe
34