China HPC TOP100 Analysis -...

of 44 /44

Embed Size (px)

Transcript of China HPC TOP100 Analysis -...

  • 2010 China HPC TOP100

    China Mainland HPC Trend

    Analysis

    Place photo here Nvidia GTC 2011, Taibei, 05/19/2011

    (Yunquan Zhang)

    [email protected]

    2010HPC TOP100

    mailto:[email protected]:[email protected]

  • HPC TOP100 Background

    2002 First list published in 2002

    20048632007

    Funded by National 863 Plan in 2004 and

    afterwards

    200520062007

    Selected by Chinese Science and Technology

    Reports Referred by many international reports on China

    HPC study

    TOP500

    Collaboration with TOP 500

    20072010Supercomputing Workshop

    Keynotes presentations at US Supercomputing

    Workshop in 2007 and 2010

  • 2010TOP100 2010 China HPC TOP100 Authors

    Yunquan Zhang, Jiachang Sun, Guoxin Yuan, Linbo Zhang

    The Specialty Association of Mathematical & Scientific Software (SAMSS)

    863

    Evaluation Center of High Performance Computer, National 863 Plan

    China HPC Technical Committee

  • Remarks

    Data source from Mainland China only

    Q From SAMSS

    T: TOP500(http://www.top500.org) From TOP500

    C: From IHV

    U: From Users

    S: TOP500(http://www.top500.org)Linpack

    Extrapolated from similar system on TOP500

    / / User is responsible for the accuracy of the data they provided. We just did

    sanity check 1011

    The list is published in fall every year

  • Manufa

    cturer Computer

    Installation

    Site

    Year

    Numof

    Proc

    Linpack

    (Gflops)

    Peak

    (Gflops)

    Efficienc

    y

    1

    NUDT

    /Tianhe1A/7168x2IntelHexaCore XeonX56702.93GHz+7168NvidiaTesla

    [email protected]+2048HexCoreFT-

    [email protected]/80Gbps

    2010 202,752 2,507,000.00 4,701,000.0

    0 0.533

    2 Dawning

    /DawningTC3600Blade/IntelHexa CoreX5650+

    NvidiaTeslaC2050GPU/QDRInfiniband

    2010 120,640 1,271,000.00 2,984,300.0

    0

    0.426

    3 IPE, CAS

    Mole-8.5Cluster/320x2IntelQCXeonE5520

    2.26Ghz+320x6NvidiaTeslaC2050/QDR

    Infiniband

    2010 33,120 207,300.00 1,138,440.0

    0

    0.182

    4 Dawning

    /5000A/1920x4AMDQCBarcelona 1.9GHz/DDRInfiniband/WCCS+Linux

    2008 30,720 180,600.00 233,472.00 0.774

    5 Lenovo

    7000/1240x2IntelXeonQCE5450 3.0GHz/140x4IntelXeonQCX73502.93GHz

    Infiniband4xDDR

    2008 12,160 106,500.00 145,293.00 0.733

    6 Dawning

    /DawningTC3600Blade/220x(2 IntelHexaCoreX5650+1

    NVidiaTeslaC2050)/QDRInfiniband

    2010 5,720 76,350.38 141,389.60 0.540

    7

    Dawning

    /DawningTC3600Blade/IntelHexa CoreX5650+

    NVidiaTeslaC2050GPU/QDRInfiniband

    2010 4,160 55,527.55 102,828.80 0.540

    8 IBM xSeriesx3650M2Cluster/IntelXeonQCE55xx

    2.53Ghz/Giga-E 2010 8,960 51,200.00 90,680.00 0.565

    9 HP ClusterPlatform3000BL460cG6/IntelXeon

    E55402.53GHz/Giga-E 2010 7,848 41,880.00 79,420.00 0.527

    10 IBM BladeCenterHS22Cluster/IntelXeonQCGT

    2.53GHz/Giga-E 2009 7,168 41,270.00 72,540.00 0.569

    2010 China HPC Top 10

  • China HPC TOP100 Authors with Tianhe 1A

    A

  • International Collaboration

    TOP500 Jack Dongarra BeowulfLSU Thomas Sterling 1A

  • China HPC TOP100 Performance

    Analysis

    1ATOP100Linpack2.5PFlops

    Tianhe 1A from National University of Defense Technology takes #1 again with Linpack performance of 2.5 PFlops

    TOP100Linpack6.23PFlops20092.83 Total Linpack Perf. 6.23PFlops, 2.83 times of 2009

    7

    6

    5

    4

    3

    2

    1

    0

    Total Performance Ratio 2008 2009 2010

    Linpack9.6TFlops The Linpack performance of all

    systems is above 9.6TFlops

    Peak performance all exceeds 11TFlops

    CPU+GPU The first 3 systems are CPU+GPU heterogeneous cluster 98200896 98 out of 100 are clusters

  • 8

  • 12

    4700TFlops

    2566TFlopsLINPACK

    2355214336 Intel X5670 CPU 2048FT1000 CPU 7168nVIDIA M2050 GPU

    262TB 2PB

    4.04MW 140 700 160

    1035 1090

  • Dawning Nebulae: 3PFlops (2010)

    Ranked June 2010 Top500 #2, Linpack 1.271PFlops

  • Nebulae HPC Section

    HTC 6 6

    Section 6 CPU 6 CPU

    HPP HPP

    X86 X86

    3 3 3 3 3 3 4 4 4 4 4 4

    +SIMD +SIMD +SIMD +SIMD +SIMD +SIMD 4 4 4 4 4 4

    +SIMD +SIMD +SIMD +SIMD +SIMD +SIMD X86 X86

    I/O

    X86 X86 CPU CPU

    8 X86 8 X86 CPU CPU X86 X86 8CPU 8CPU

    Dawning6000 supercomputer topology

  • Nebulae features

    High reliability Fully redundant design

    Highly stable in linpack benchmarking

    High performance Peak3 PetaFLOPs

    Linpack1.271 PetaFLOPs Ranked num. 2 in june,2010

    High density

    One cabinet

    25.7TFlops

    High productivity HPP architecture

    High efficiency

    heterogeneous computing

    platform

    Power save 489 GFLOPs/Kw Top4 in green500

    Low cost Use self made

    components with

    commodity hardware

    Intellectual Property CloudBase

    TC3600 Blade

    ParaStor storage

    Cloudview management

  • Nebulae architecture

  • Nebulae Heterogeneous Computing system

    GPGPU TC3600

    Peak performance of one chassis: 6.43TFlops Linpack performance of one chassis: (DP3.53TFlops CPUGPU128515 Performance

  • Tylersburg 36D

    GPU1

    PEX8647 PEX8647

    GPU2 GPU3 IB

    Tylersburg 36D

    PEX8647 PEX8647

    GPU1

    GPU2 GPU3

    CPU0 CPU1

    DDR3 Mem* 3

    DDR3 Mem*3

    DDR3 Mem*3

    DDR3 Mem*3

    DDR3 Mem*3

    DDR3 Mem*3

    Node layout of Mole-8.5

    Bottleneck:

    DeMem PCIE

    IB

    6xC2050

    (Fermi)

    QDR IB

    Tyan S7015

    HD

    Mem

    2xE5520/

    70

    Fan

  • Section

    :

    3*10m

    2D

    CFD+

    EMMS 1.2M cells 96 GPUs Quasi- realtime ~50x speedup

    React

    or:

    9*40m

    3D

    EMMS 100M grids 432 GPUs ~3s ~100x* speedup

    Cell:

    10*48c

    m

    2D

    DNS 1M solids 1G fluids 576 GPUs 30~50x speedup

    * one C2050 as compared with one core of Intel E5430 at 2.66GHz, both in single precision

    Simulation of gas solid flow on multi-scales

  • Rotating drum: 9.6M solids, 270GPUs, 13.5*1.5m, 1/9 realtime

    Xu et al., submitted to Particuology, 2010

  • Cou

    nt

    Cluster Share in China HPC TOP100

    0

    90 80 70 60 50 40 30 20 10

    100

    Cluster Share

  • Manufacturer

    Syste

    ms

    Share

    Rmax

    [TF/s]

    Rpeak

    [TF/s]

    Efficiency

    Numof

    Proc

    Dawning 34 34% 2028.19 4218.89 61.07% 233436

    Inspur 5 5% 92.11 115.38 78.30% 10360

    Lenovo 3 3% 126.69 182.27 50.83% 16128

    Sunway 3 3% 50.74 64.49 80.23% 6096

    PowerLeader 2 2% 40.38 51.20 79.00% 4320

    NUDT 1 1% 2507.00 4701.00 53.30% 202752

    IPE 1 1% 207.30 1138.44 18.20% 33120

    DomesticTotal 49 49% 5052.41 10471.67 60.13% 506212

    IBM 28 28% 753.01 1328.21 58.13% 133000

    HP 19 19% 367.46 629.12 60.93% 65508

    Dell 3 3% 47.83 74.60 72.43% 6880

    SUN 1 1% 10.46 13.58 66.00% 1200

    ImportTotal 51 51% 1178.76 2045.51 64.37% 206588

    Total 100 100% 6231.17 12517.59 62.00% 712800

    Dom

    estic

    Imp

    ort

    HPC TOP100 Manufacturer Analysis

  • Dom

    estic

    Imp

    ort

    HPC TOP100 Manufacturer Share Trend 100

    80

    60

    40

    20

    0

    2002 2003 2004 2005 2006 2007 2008 2009 2010 IBM

    DELL

    Sunway

    PowerLeader

    Self Assembled

    Juxin

    SGI

    Dawning

    Inspur

    Galactic

    Huayun

    Beijing Computer Center

    HP

    SUN

    Lenovo

    Tsinghua Univ.

    Shanghai Univ.

    ICT

    Others

  • NUDT, 1

    HPC TOP100 Manufacturer Shares By Number of Systems

    IBM, 28

    HP, 19

    Inspur, 5

    DELL, 3

    Lenovo, 3 Sunway, 3 PowerLeader, 2 SUN, 1 IPE, 1

    Dawning, 34 2010HPC TOP100 http://www.samss.org.cn

  • HPC TOP100 Manufacturer Share by Performance Dawning,

    32.55% NUDT,

    40.23%

    IBM, 12.08% HP, 5.90%

    IPE, 3.33%

    Lenovo, 2.03% Inspur, 1.48%

    Sunway, 0.81%

    DELL, 0.77% PowerLeader,

    0.65%

    SUN, 0.17%

    2010HPC TOP100 http://www.samss.org.cn

  • Area #systems

    Share Linpack[GF/s] Peak[GF/s] Efficiency

    #ofProc

    Energy 17 17% 265508.07 467189.50 59.07% 46100

    Industry 15 15% 4299853.48 8516574.64 70.76% 401324

    Research 12 12% 476779.40 1491403.64 73.83% 64376

    Gaming 9 9% 291100.00 517130.00 55.76% 51136

    Government 9 9% 138162.97 266433.60 52.07% 29096

    Telecomm 7 7% 187450.40 348690.34 53.84% 37360

    Education 7 7% 129689.42 167107.76 77.94% 13624

    Weather 5 5% 85589.00 115121.52 74.62% 12192

    Bio 4 4% 100894.55 178611.80 63.03% 10864

    Internet 4 4% 88469.25 163946.00 53.40% 16600

    Logistics 2 2% 43939.10 81960.96 53.95% 8368

    Earthquake 2 2% 37372.00 50066.08 76.15% 4608

    Visualization 2 2% 31507.37 58988.16 53.40% 6608

    Power 2 2% 21726.15 38752.00 56.15% 4240

    DDC 1 1% 12115.26 22131.20 54.70% 2080

    InternetofThings 1 1% 11095.04 20377.60 54.40% 2176

    Finance 1 1% 9830.25 13107.00 75.00% 2048

    Total 100 100% 6231171.71 12517591.80 62.00% 712800

    HPC TOP100 Application Areas

  • HPC TOP100 Application Areas Analysis

    Number of application areas increases than previous years

    Number of systems: Top areas are energy, industry, and research

    Total system performance: Top areas are industry, research,

    and gaming

    Main users: Energy, industry, research, gaming, and

    government

    New users: Internet of things, internet, and power

  • 2002

    2003

    2004

    2005

    2006

    2007

    2008

    2009

    2010 HPC TOP100 Application Area Trend 100 90 80 70 60 50 40 30 20 10 0

  • HPC TOP100 Application Area System Shares

    2010TOP100 http://www.samss.org.cn 1%

    1%

    2% 1%

    2% 2%

    4% 2%

    4%

    5%

    7%

    7%

    17%

    9%

    9% 12%

    15%

  • HPC TOP100 Application Area Performance Shares

    2010TOP100 http://www.samss.org.cn

    7.65%

    4.13%

    2.08%

    2.21%

    3.00%

    4.26%

    69.01% 1.62% 1.37%

    4.67%

  • HPC TOP100 Multicore Processor Shares

    12, 3% 2, 2%

    6, 14% 4, 81% 2010HPC TOP100 http://www.samss.org.cn

  • HPC TOP100 Processor Manufacturer Shares

    Intel, 80%

    AMD, 19%

    IBM, 1%

    2010HPC TOP100 http://www.samss.org.cn

  • HPC TOP100 Interconnect Shares

    Infiniband, 37% Giga-E, 59%

    HyperPlex,

    1% 10GE, 1%

    Federation, 1%

    NUDT Proprietary, 1%

  • GF

    lop

    s

    19

    93

    19

    95

    19

    97

    19

    99

    20

    01

    20

    03

    20

    05

    20

    07

    20

    09

    20

    11

    20

    13

    20

    15

    20

    17

    20

    19

    1E+10 1E+09 1E+08 1E+07

    HPC TOP100 Performance Trend (1993-2010) 1E+11

    1E+06 100000 10000 1000 100 10 1

    Linpack

  • (1) Trend & Outlook (1)

    1993-2010 China HPC performance increase

    19931996 1993-1996 Slow steady

    19961999 1996-1999 Big jump

    19992001 1999-2001 Slow steady

    20012005 2001-2005 Another period of big increase

    20052007 2005-2007 Slow steady again

    200823 After 2008, dramatic increase in the next 2-3 years

  • (2) Trend & Outlook (2)

    Previous Predictions

    100TFflops20072008200810 2007-2008: System with peak performance of 100TFlops

    (Reality: Oct 2008)

    Linpack20082009PFlops200810 2008-2009: Total Linpack performance exceeds Pflops

    (Reality: Oct 2008)

    PFlops20102011 2010-2011: System with peak performance of 1PFlops (Reality: Oct 2009)

  • (3) Trend & Outlook (3)

    Future Predictions

    10PFlops20122013

    2012-2013: System with peak performance of 10 PFlops

    Linpack2011201210PFlops

    2011-2012: Total Linpack performance reaches 10PFlops

    100PFlops20142015

    2014-2015: System with peak performance of 100 PFlops

    Linpack20132014100PFlops

    2013-2014: Total Linpack performance reaches 100 PFlops

  • Thank You

    Contact: Yunchuan Zhang, Ph.D.

    Emails: [email protected]

    [email protected]

    mailto:[email protected]:[email protected]:[email protected]:[email protected]