China HPC TOP100 Analysis -...
Embed Size (px)
Transcript of China HPC TOP100 Analysis -...
-
2010 China HPC TOP100
China Mainland HPC Trend
Analysis
Place photo here Nvidia GTC 2011, Taibei, 05/19/2011
(Yunquan Zhang)
2010HPC TOP100
-
HPC TOP100 Background
2002 First list published in 2002
20048632007
Funded by National 863 Plan in 2004 and
afterwards
200520062007
Selected by Chinese Science and Technology
Reports Referred by many international reports on China
HPC study
TOP500
Collaboration with TOP 500
20072010Supercomputing Workshop
Keynotes presentations at US Supercomputing
Workshop in 2007 and 2010
-
2010TOP100 2010 China HPC TOP100 Authors
Yunquan Zhang, Jiachang Sun, Guoxin Yuan, Linbo Zhang
The Specialty Association of Mathematical & Scientific Software (SAMSS)
863
Evaluation Center of High Performance Computer, National 863 Plan
China HPC Technical Committee
-
Remarks
Data source from Mainland China only
Q From SAMSS
T: TOP500(http://www.top500.org) From TOP500
C: From IHV
U: From Users
S: TOP500(http://www.top500.org)Linpack
Extrapolated from similar system on TOP500
/ / User is responsible for the accuracy of the data they provided. We just did
sanity check 1011
The list is published in fall every year
-
Manufa
cturer Computer
Installation
Site
Year
Numof
Proc
Linpack
(Gflops)
Peak
(Gflops)
Efficienc
y
1
NUDT
/Tianhe1A/7168x2IntelHexaCore XeonX56702.93GHz+7168NvidiaTesla
[email protected]+2048HexCoreFT-
[email protected]/80Gbps
2010 202,752 2,507,000.00 4,701,000.0
0 0.533
2 Dawning
/DawningTC3600Blade/IntelHexa CoreX5650+
NvidiaTeslaC2050GPU/QDRInfiniband
2010 120,640 1,271,000.00 2,984,300.0
0
0.426
3 IPE, CAS
Mole-8.5Cluster/320x2IntelQCXeonE5520
2.26Ghz+320x6NvidiaTeslaC2050/QDR
Infiniband
2010 33,120 207,300.00 1,138,440.0
0
0.182
4 Dawning
/5000A/1920x4AMDQCBarcelona 1.9GHz/DDRInfiniband/WCCS+Linux
2008 30,720 180,600.00 233,472.00 0.774
5 Lenovo
7000/1240x2IntelXeonQCE5450 3.0GHz/140x4IntelXeonQCX73502.93GHz
Infiniband4xDDR
2008 12,160 106,500.00 145,293.00 0.733
6 Dawning
/DawningTC3600Blade/220x(2 IntelHexaCoreX5650+1
NVidiaTeslaC2050)/QDRInfiniband
2010 5,720 76,350.38 141,389.60 0.540
7
Dawning
/DawningTC3600Blade/IntelHexa CoreX5650+
NVidiaTeslaC2050GPU/QDRInfiniband
2010 4,160 55,527.55 102,828.80 0.540
8 IBM xSeriesx3650M2Cluster/IntelXeonQCE55xx
2.53Ghz/Giga-E 2010 8,960 51,200.00 90,680.00 0.565
9 HP ClusterPlatform3000BL460cG6/IntelXeon
E55402.53GHz/Giga-E 2010 7,848 41,880.00 79,420.00 0.527
10 IBM BladeCenterHS22Cluster/IntelXeonQCGT
2.53GHz/Giga-E 2009 7,168 41,270.00 72,540.00 0.569
2010 China HPC Top 10
-
China HPC TOP100 Authors with Tianhe 1A
A
-
International Collaboration
TOP500 Jack Dongarra BeowulfLSU Thomas Sterling 1A
-
China HPC TOP100 Performance
Analysis
1ATOP100Linpack2.5PFlops
Tianhe 1A from National University of Defense Technology takes #1 again with Linpack performance of 2.5 PFlops
TOP100Linpack6.23PFlops20092.83 Total Linpack Perf. 6.23PFlops, 2.83 times of 2009
7
6
5
4
3
2
1
0
Total Performance Ratio 2008 2009 2010
Linpack9.6TFlops The Linpack performance of all
systems is above 9.6TFlops
Peak performance all exceeds 11TFlops
CPU+GPU The first 3 systems are CPU+GPU heterogeneous cluster 98200896 98 out of 100 are clusters
-
8
-
12
4700TFlops
2566TFlopsLINPACK
2355214336 Intel X5670 CPU 2048FT1000 CPU 7168nVIDIA M2050 GPU
262TB 2PB
4.04MW 140 700 160
1035 1090
-
Dawning Nebulae: 3PFlops (2010)
Ranked June 2010 Top500 #2, Linpack 1.271PFlops
-
Nebulae HPC Section
HTC 6 6
Section 6 CPU 6 CPU
HPP HPP
X86 X86
3 3 3 3 3 3 4 4 4 4 4 4
+SIMD +SIMD +SIMD +SIMD +SIMD +SIMD 4 4 4 4 4 4
+SIMD +SIMD +SIMD +SIMD +SIMD +SIMD X86 X86
I/O
X86 X86 CPU CPU
8 X86 8 X86 CPU CPU X86 X86 8CPU 8CPU
Dawning6000 supercomputer topology
-
Nebulae features
High reliability Fully redundant design
Highly stable in linpack benchmarking
High performance Peak3 PetaFLOPs
Linpack1.271 PetaFLOPs Ranked num. 2 in june,2010
High density
One cabinet
25.7TFlops
High productivity HPP architecture
High efficiency
heterogeneous computing
platform
Power save 489 GFLOPs/Kw Top4 in green500
Low cost Use self made
components with
commodity hardware
Intellectual Property CloudBase
TC3600 Blade
ParaStor storage
Cloudview management
-
Nebulae architecture
-
Nebulae Heterogeneous Computing system
GPGPU TC3600
Peak performance of one chassis: 6.43TFlops Linpack performance of one chassis: (DP3.53TFlops CPUGPU128515 Performance
-
Tylersburg 36D
GPU1
PEX8647 PEX8647
GPU2 GPU3 IB
Tylersburg 36D
PEX8647 PEX8647
GPU1
GPU2 GPU3
CPU0 CPU1
DDR3 Mem* 3
DDR3 Mem*3
DDR3 Mem*3
DDR3 Mem*3
DDR3 Mem*3
DDR3 Mem*3
Node layout of Mole-8.5
Bottleneck:
DeMem PCIE
IB
6xC2050
(Fermi)
QDR IB
Tyan S7015
HD
Mem
2xE5520/
70
Fan
-
Section
:
3*10m
2D
CFD+
EMMS 1.2M cells 96 GPUs Quasi- realtime ~50x speedup
React
or:
9*40m
3D
EMMS 100M grids 432 GPUs ~3s ~100x* speedup
Cell:
10*48c
m
2D
DNS 1M solids 1G fluids 576 GPUs 30~50x speedup
* one C2050 as compared with one core of Intel E5430 at 2.66GHz, both in single precision
Simulation of gas solid flow on multi-scales
-
Rotating drum: 9.6M solids, 270GPUs, 13.5*1.5m, 1/9 realtime
Xu et al., submitted to Particuology, 2010
-
Cou
nt
Cluster Share in China HPC TOP100
0
90 80 70 60 50 40 30 20 10
100
Cluster Share
-
Manufacturer
Syste
ms
Share
Rmax
[TF/s]
Rpeak
[TF/s]
Efficiency
Numof
Proc
Dawning 34 34% 2028.19 4218.89 61.07% 233436
Inspur 5 5% 92.11 115.38 78.30% 10360
Lenovo 3 3% 126.69 182.27 50.83% 16128
Sunway 3 3% 50.74 64.49 80.23% 6096
PowerLeader 2 2% 40.38 51.20 79.00% 4320
NUDT 1 1% 2507.00 4701.00 53.30% 202752
IPE 1 1% 207.30 1138.44 18.20% 33120
DomesticTotal 49 49% 5052.41 10471.67 60.13% 506212
IBM 28 28% 753.01 1328.21 58.13% 133000
HP 19 19% 367.46 629.12 60.93% 65508
Dell 3 3% 47.83 74.60 72.43% 6880
SUN 1 1% 10.46 13.58 66.00% 1200
ImportTotal 51 51% 1178.76 2045.51 64.37% 206588
Total 100 100% 6231.17 12517.59 62.00% 712800
Dom
estic
Imp
ort
HPC TOP100 Manufacturer Analysis
-
Dom
estic
Imp
ort
HPC TOP100 Manufacturer Share Trend 100
80
60
40
20
0
2002 2003 2004 2005 2006 2007 2008 2009 2010 IBM
DELL
Sunway
PowerLeader
Self Assembled
Juxin
SGI
Dawning
Inspur
Galactic
Huayun
Beijing Computer Center
HP
SUN
Lenovo
Tsinghua Univ.
Shanghai Univ.
ICT
Others
-
NUDT, 1
HPC TOP100 Manufacturer Shares By Number of Systems
IBM, 28
HP, 19
Inspur, 5
DELL, 3
Lenovo, 3 Sunway, 3 PowerLeader, 2 SUN, 1 IPE, 1
Dawning, 34 2010HPC TOP100 http://www.samss.org.cn
-
HPC TOP100 Manufacturer Share by Performance Dawning,
32.55% NUDT,
40.23%
IBM, 12.08% HP, 5.90%
IPE, 3.33%
Lenovo, 2.03% Inspur, 1.48%
Sunway, 0.81%
DELL, 0.77% PowerLeader,
0.65%
SUN, 0.17%
2010HPC TOP100 http://www.samss.org.cn
-
Area #systems
Share Linpack[GF/s] Peak[GF/s] Efficiency
#ofProc
Energy 17 17% 265508.07 467189.50 59.07% 46100
Industry 15 15% 4299853.48 8516574.64 70.76% 401324
Research 12 12% 476779.40 1491403.64 73.83% 64376
Gaming 9 9% 291100.00 517130.00 55.76% 51136
Government 9 9% 138162.97 266433.60 52.07% 29096
Telecomm 7 7% 187450.40 348690.34 53.84% 37360
Education 7 7% 129689.42 167107.76 77.94% 13624
Weather 5 5% 85589.00 115121.52 74.62% 12192
Bio 4 4% 100894.55 178611.80 63.03% 10864
Internet 4 4% 88469.25 163946.00 53.40% 16600
Logistics 2 2% 43939.10 81960.96 53.95% 8368
Earthquake 2 2% 37372.00 50066.08 76.15% 4608
Visualization 2 2% 31507.37 58988.16 53.40% 6608
Power 2 2% 21726.15 38752.00 56.15% 4240
DDC 1 1% 12115.26 22131.20 54.70% 2080
InternetofThings 1 1% 11095.04 20377.60 54.40% 2176
Finance 1 1% 9830.25 13107.00 75.00% 2048
Total 100 100% 6231171.71 12517591.80 62.00% 712800
HPC TOP100 Application Areas
-
HPC TOP100 Application Areas Analysis
Number of application areas increases than previous years
Number of systems: Top areas are energy, industry, and research
Total system performance: Top areas are industry, research,
and gaming
Main users: Energy, industry, research, gaming, and
government
New users: Internet of things, internet, and power
-
2002
2003
2004
2005
2006
2007
2008
2009
2010 HPC TOP100 Application Area Trend 100 90 80 70 60 50 40 30 20 10 0
-
HPC TOP100 Application Area System Shares
2010TOP100 http://www.samss.org.cn 1%
1%
2% 1%
2% 2%
4% 2%
4%
5%
7%
7%
17%
9%
9% 12%
15%
-
HPC TOP100 Application Area Performance Shares
2010TOP100 http://www.samss.org.cn
7.65%
4.13%
2.08%
2.21%
3.00%
4.26%
69.01% 1.62% 1.37%
4.67%
-
HPC TOP100 Multicore Processor Shares
12, 3% 2, 2%
6, 14% 4, 81% 2010HPC TOP100 http://www.samss.org.cn
-
HPC TOP100 Processor Manufacturer Shares
Intel, 80%
AMD, 19%
IBM, 1%
2010HPC TOP100 http://www.samss.org.cn
-
HPC TOP100 Interconnect Shares
Infiniband, 37% Giga-E, 59%
HyperPlex,
1% 10GE, 1%
Federation, 1%
NUDT Proprietary, 1%
-
GF
lop
s
19
93
19
95
19
97
19
99
20
01
20
03
20
05
20
07
20
09
20
11
20
13
20
15
20
17
20
19
1E+10 1E+09 1E+08 1E+07
HPC TOP100 Performance Trend (1993-2010) 1E+11
1E+06 100000 10000 1000 100 10 1
Linpack
-
(1) Trend & Outlook (1)
1993-2010 China HPC performance increase
19931996 1993-1996 Slow steady
19961999 1996-1999 Big jump
19992001 1999-2001 Slow steady
20012005 2001-2005 Another period of big increase
20052007 2005-2007 Slow steady again
200823 After 2008, dramatic increase in the next 2-3 years
-
(2) Trend & Outlook (2)
Previous Predictions
100TFflops20072008200810 2007-2008: System with peak performance of 100TFlops
(Reality: Oct 2008)
Linpack20082009PFlops200810 2008-2009: Total Linpack performance exceeds Pflops
(Reality: Oct 2008)
PFlops20102011 2010-2011: System with peak performance of 1PFlops (Reality: Oct 2009)
-
(3) Trend & Outlook (3)
Future Predictions
10PFlops20122013
2012-2013: System with peak performance of 10 PFlops
Linpack2011201210PFlops
2011-2012: Total Linpack performance reaches 10PFlops
100PFlops20142015
2014-2015: System with peak performance of 100 PFlops
Linpack20132014100PFlops
2013-2014: Total Linpack performance reaches 100 PFlops
-
Thank You
Contact: Yunchuan Zhang, Ph.D.
Emails: [email protected]
mailto:[email protected]:[email protected]:[email protected]:[email protected]