State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2%...

59
State-of-the-Art Analysis and Perspectives of China HPC Development and Applications SIAM PP 2012, Savannah, Georgia, USA, 2/17/2012 云泉(Yunquan Zhang) 中科院件所并行件与算科学实验(Institute of Software, Chinese Academy of Sciences) 算机科学国家重点实验(State Key Lab. of Computer Science) 合作者:家昶(Jiachang Sun) 袁国(Guoxing Yuan) 林波(Linbo Zhang) [email protected] Place photo here 中国大陆高性能计算机的发展与应用趋势分析与展望

Transcript of State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2%...

Page 1: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

State-of-the-Art Analysis and Perspectives of China HPC Development and Applications

SIAM PP 2012, Savannah, Georgia, USA, 2/17/2012

张云泉(Yunquan Zhang) 中科院软件所并行软件与计算科学实验室(Institute of Software, Chinese Academy of Sciences) 计算机科学国家重点实验室(State Key Lab. of Computer Science) 合作者:孙家昶(Jiachang Sun) 袁国兴(Guoxing Yuan) 张林波(Linbo Zhang) [email protected]

Place photo here

中国大陆高性能计算机的发展与应用趋势分析与展望

Page 2: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Outline

• Background of China HPC TOP100 • Analysis of 2011 China HPC TOP100 • Overview of China 863 key project • Petascale Applications on TianHe-1A • Future HPC performance development

trends of China • Summary

Page 3: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

• First released on 2002. Becomes the De Facto Industry Standard of HPC ranking in China Mainland, widely adopted by researchers, users, vendors and government;

• One of procurement index of customers, cited by lots of technical reports and project proposals;

• Partial supported by National 863 plan on HPC computer and kernel software key project;

• Technical report based on TOP100 were selected as chapters of the Annual Progress Report of China Computer Science and Technology, edited by China CCF from 2005 to 2007 and 2009 to 2010.

• On 2004, Prof. David Keyes , presented a talk on China HPC development “Supercomputing in China” based the statistics data of China HPC TOP100 rank list. And 2008 again.

• The English version of TOP100 exchanged with editor of TOP500, Prof. Hans Meuer and Prof. Jack Dongarra

• The TOP500 and TOP100 website exchanged link with each other, TOP500 reported the release news of China HPC TOP100 for two years.

• Invited by NSF, we presented an invited speech on HPC in China workshop of SC2007, Reno,USA

• Invited Plenary Talk on ISC 2011

China HPC TOP100

Page 4: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

2011年中国高性能计算机性能TOP100排行榜 2011 China HPC TOP100 Rank List

张云泉 孙家昶 袁国兴 张林波

Yunquan Zhang, Jiachang Sun, Guoxin Yuan, Linbo Zhang 中国软件行业协会数学软件分会

The Specialty Association of Mathematical & Scientific Software (SAMSS) 国家863高性能计算机评测中心

Evaluation Center of High Performance Computer, National 863 Plan 中国计算机学会高性能计算专业委员会

China HPC Technical Committee

Page 5: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

注解 Remarks

• 数据只来源中国大陆地区 Data source from Mainland China only

• “Q”:本协会测试、抽查或部级鉴定会认可 Certificated by SAMSS • “T”: TOP500(http://www.top500.org)公布的数据 From TOP500 • “C”: 机器制造商 From Vendor • “U”: 商业公司的公开数据和用户填写的调查表 From Users • “S”: 从TOP500(http://www.top500.org)公布的同型号系统较大规模机器的Linpack值等比推

算出来的 Extrapolated from similar system on TOP500

• 对用户/厂商的数据,本协会只负责对其合理性进行检查,其真实性由填写调查表的用户/厂商负责 User is responsible for the accuracy of the data they provided. We just did sanity check

• 本排行榜将至少在每年10月底或11月上旬公布一次

The list is published in fall every year

Page 6: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

2011 China HPC Top 10

Rank

厂商 Vendor 配置 Configuration

安装地点 Installation Site

安装年份 Year

应用领域(App. Area)

处理器核 Num of Proc

Linpack (Gflops)

Peak (Gflops)

效率 Efficiency

1 NUDT

Tianhe-1A/7168x2 Intel Hexa Core Xeon X5670 2.93GHz + 7168 Nvidia Tesla [email protected]+2048 Hex Core FT-1000@1GHz/NUDT Private Network 80Gbps

National Supercomputing Center at Tianjin

2010 Supercompuiting Center

202752 2566000.00 4701000.00 0.546

2 NPCEC Sunway BlueLight /8575x16 Core Shenwei 1600@975MHz/QDR Infiniband

National Supercomputing Center at Jinan

2011 Supercompuiting Center

137200 795900.00 1070160.00 0.744

3 NUDT

Tianhe-1A-HN/2048x2 Intel Hexa Core Xeon X5670 2.93GHz + 2048 Nvidia Tesla [email protected]/NUDT Private Network 80Gbps

National Supercomputing Center at Changsha

2011 Supercompuiting Center

53248 771700.00 1343200.00 0.575

4 Sugon Nebulae/Dawning TC3600 Blade/2560x (2 Intel Hexa Core X5650 + Nvidia Tesla C2050 GPU)/QDR Infiniband

National Supercomputing Center at Shenzhen

2011 Supercompuiting Center

52416 749200.00 1296320.26 0.578

5 IBM xSeries x3650M3/Intel Xeon X56xx 2.53 GHz/Giga-E

Network Company 2011 Internet

Service 113040 636985.00 1143965.00 0.557

6 IPE,CAS Mole-8.5 Cluster/320x2 Intel QC Xeon E5520 2.26 Ghz + 320x6 Nvidia Tesla C2050/QDR Infiniband

IPE, CAS 2010 Scientific Computing 33120 496500.00 1138440.00 0.436

7 Sugon Nebulae//Dawning TC3600 Blade/3040 x 2 Intel Hexa Core X5650/QDR Infiniband

Shenzhen Cloud Computing Center

2011 Cloud Computing 36480 342300.00 389168.64 0.880

8 IBM xSeries x3650M3/Intel Xeon X56xx 2.93 GHz/Giga-E Telecomm 2011 Industry 36336 204754.40 425856.00 0.481

9 IBM xSeries x3650M2 Cluster/Intel Xeon QC E55xx 2.53 GHz/Giga-E

Network Company 2011

Internet Service

34688 196228.00 351044.00 0.559

10 Sugon

Magic Cube/Dawning 5000A/1920x4 AMD QC Barcelona 1.9GHz/DDR Infiniband/WCCS+Linux

Shanghai Supercompuiting Center

2008 Supercompuiting Center

30720 180600.00 233472.00 0.774

Page 7: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

China HPC TOP100 Authors with Tianhe 1A

数学软件分会孙家昶、袁国兴和张云泉等三人现场考察国防科技大学研制成功的千万亿次超级计算系统“天河一号A”

Page 8: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

International Collaboration

国际TOP500作者之一 、美国田纳西大学Jack Dongarra教授和Beowulf之父LSU Thomas Sterling教授、数学软件分会副理事长迟学斌 研究员、秘书长 张云泉 研究员等现场考察天河1A

Page 9: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Outline

• Background of China HPC TOP100 • Analysis of 2011 China HPC TOP100 • Overview of China 863 key project • Petascale Applications on TianHe-1A • Future HPC performance development

trends of China • Summary

Page 10: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

China HPC TOP100 Performance Analysis

• 国防科大天河1A再次蝉联中国TOP100第一名Linpack性能2.56PFlops Tianhe 1A from National University of Defense Technology takes #1 again with Linpack performance of 2.56 PFlops

• 中国TOP100的总Linpack性能12PFlops,为2010年的1.9倍 •Total Linpack Perf. 12 PFlops, 1.9 times of 2010

• 全部机器的Linpack性能超过22.1TFlops The Linpack performance of all systems is above 22.1TFlops • Peak performance all exceeds

25.6TFlops • 排名前十的机器4套是CPU+GPU异构机群

The No.1, No.3 ,No.4 and No.6 of TOP10 are CPU+GPU heterogeneous cluster

• 共有97个(2010年98个)系统是机群 97 out of 100 are clusters

Page 11: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

集群份额 Cluster Share in China HPC TOP100

Cou

nt

Page 12: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

中国HPC TOP100制造商分析 Manufacturer Analysis

厂商 Manufacturer

系统 Systems

份额 Share

Rmax [TF/s]

Rpeak [TF/s]

平均效率 Efficiency

处理器核 Num of

Proc

国产机器

D

omestic

曙光Sugon 35 35% 2848.18 4544.56 61.40% 363864 浪潮Inspur 7 7% 306.93 535.39 60.50% 55748

神威Sunway 5 5% 1087.80 1404.71 84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000

中科院过程所IPE 1 1% 496.50 1138.44 43.60% 33120 联想Lenovo 1 1% 102.80 145.29 70.80% 12160

国产小计 Domestic Total 51 51% 8204.11 13812.59 62.90% 886404

引进机

Imp

ort

IBM 35 35% 3264.31 6020.59 57.60% 588524 HP 13 13% 509.51 927.77 57.60% 98056

Dell 1 1% 23.40 44.93 72.43% 6880 引进小计 Import Total 49 49% 3797.22 6993.28 57.50% 690900

总计 Total 100 100% 12001.33 20805.87 59.63% 1577304

Page 13: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

中国HPC TOP100厂商份额趋势 Manufacturer Share Trend

Import

Dom

estic

Page 14: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

中国HPC TOP100制造商机器数量份额图 Manufacturer Shares By Number of Systems

2011 China HPC TOP100 http://www.samss.org.cn

Page 15: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

中国HPC TOP100制造商机器性能份额图 Manufacturer Share by Performance

2011 China HPC TOP100 http://www.samss.org.cn

Page 16: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

HPC TOP100 Application Areas App Area # systems Share Linpack[GF/s] Peak [GF/s] Eficiency # of Proc

Internet Service 21 21% 2133.82 3963.18 53.30% 404568

Goverment 16 16% 763.91 1450.00 52.00% 155648

Education 9 9% 293.01 424.04 76.30% 30740

SC Center 8 8% 5333.40 8892.26 66.84% 502616

Telecomm 7 7% 474.31 923.01 53.20% 88192

Engeering 6 6% 541.98 1026.46 54.10% 95720

Scientific Computing 5 5% 742.70 1455.37 67.70% 56300

On-line Gaming 5 5% 388.62 682.08 57.00% 68648

Weather Forcasting 5 5% 202.46 236.82 85.20% 22064

Energy 4 4% 112.02 208.98 59.30% 13852

Cloud Computing 3 3% 436.35 571.11 63.60% 44300

Service Provider 2 2% 213.88 383.26 55.80% 37872

Power 2 2% 81.87 118.27 67.70% 13440

Semi-Conductor 2 2% 79.20 150.37 53.50% 15352

Bioinformatics 2 2% 78.93 147.76 53.00% 8480

Video 1 1% 46.38 81.79 56.70% 9600

Logistics 1 1% 31.03 58.40 53.10% 5840

Earthquake Engineering 1 1% 23.27 32.69 71.20% 3072

Total 100 100% 12001.33 20805.87 59.63% 1577304

Page 17: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

中国HPC TOP100行业领域分析 Application Areas Analysis

• 领域数量有所增加 18 Number of application areas 18, increases than previous years

• 机器数量:前三个行业为互联网服务、政府和教育 Number of systems: Top 3 areas are internet service, government and education

• 机器性能:前三个行业超算中心、互联网服务和政府 Total Linpack performance: Top 3 areas are supercomputing center, internet service and government

• 主要用户:互联网服务、政府、超算中心和教育 Main users: internet service, government,supercomputing center and education

• 新用户:云计算、半导体 New users: cloud computing,semiconductor

Page 18: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

中国HPC TOP100应用领域趋势 Application Area Trend

Page 19: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

中国HPC TOP100行业应用领域机器系统份额图 Application Area System Shares

2011 China HPC TOP100 http://www.samss.org.cn

Page 20: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

中国HPC TOP100行业应用领域机器系统份额图 Application Area System Shares

2011 China HPC TOP100 http://www.samss.org.cn

Page 21: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Outline

• Background of China HPC TOP100 • Analysis of 2011 China HPC TOP100 • Overview of China 863 key project • Petascale Applications on TianHe-1A • Future HPC performance development

trends of China • Summary

Page 22: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

China 863 Program • The National High-tech R&D Program (863

Program) • proposed by 4 senior Chinese Scientists and

approved by former leader Mr. Deng Xiaoping in March 1986

• One of the most important national science and technology R&D programs in China

• Now a regular national R&D program planed in 5-year terms, current we just finished the 11th five-year plan and at the begining the 12th five-year plan.

Page 23: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Overview of 863 key project on HPC and Grid

• “High performance computer and core software” • 4-year project, May 2002 to Dec. 2005 • 100 million Yuan funding from the MOST • More than 2Χ associated funding from local government,

application organizations, and industry • Outcomes: China National Grid (CNGrid)

• “High productivity Computer and Grid Service Environment” • Period: 2006-2010 • 940 million Yuan from the MOST and more than 1B Yuan

matching money from other sources

Page 24: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Major R&D activities

• Developing Petaflops Supercomputers

• Building up a grid service environment--CNGrid

• Developing Grid and HPC applications in selected areas

Page 25: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Two phase development • First phase: two 100TFlops machines

• Dawning 5000A for SSC • Lenovo DeepComp 7000 for SCCAS

• Second phase: three Petaflops machines • Tianhe 1A: NUDT/Inspur/Tianjin Supercomputing

Center • Dawning 6000: ICT/Dawning/South China

Supercomputing Center (Shenzhen) • Sunway Bulelight: National Engineering Center on

Parallel Computer/Shandong Supercomputing Center

Page 26: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Dawning5000A (2008)

• China surpassed Japan in HPC performance • ICT regained performance crown in China, following Machine-

757 (1983) and Dawning1000 (1995)

• Peak: 233.5TFlops • Linpack: 180.6TFlops (Eff. 77.34%) • Power: <800KW • MPI Latency: 1.6us • Top10, Nov 2008

Page 27: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Dawning 5000A • Constellation based on AMD

multicore processors • Low power CPU and high

density blade design • High performance InfiniBand

switch • 233.472TFlops peak

performance, 180.6TFlops Linpack performance

• The 10th in TOP500 in Nov. 2008, the fastest machine outside USA

Page 28: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Lenovo DeepComp 7000 Hybrid cluster

architecture using Intel multicore processors

Two sets of interconnect InInfiniBand Gb Ethernet

SAN connection between I/O nodes and disk array

145.965TFlops peak performance

106.5 Tflops Linpack performance

The 19th in TOP500 in Nov. 2008

Page 29: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Dawning Nebulae: 3PFlops (2010)

Ranked Top500 #2, Linpack 1.271PFlops

Page 30: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Dawning 6000 • Hybrid system

• Service unit (Nebulae) • 9600 Intel 6-core

Westmere processor • 4800 nVidia Fermi

GPGPU • 3PF peak performance • 1.27 Linpack

performance • 2.6 MW

• Computing unit • Domestic processor

Page 31: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Tianhe-IA • Hybrid system

• 14336 General purpose unit--Intel 6-core processors

• 7168 Acceleration unit—NVIDIA Fermi GPUs • 2048 Service unit—FT-1000 processors • 80Gbps NUDT Proprietary Th-Net(Hierarch Fat Tree) • Kylin Linux OS • MPI + OpenMP/Pthread + CUDA/OpenCL

• 4.7PFlops peak, 2.57PFlops Linpack(>50% Eff.)

• 262TB Mem. 2PB Storage, • Water cooling, 4.04MW (635.15MF/W) • 120 Compute,14 Storage,6

Communication • Installed on Aug., 2010 • TOP500 No.1 on Nov. 2010.

Page 32: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Quad cpu blade

TH-1A System

FT-1000

X5670

M2050

Chips

Twin GPU blade

Compute node

rack (16 x cn)

Cabinet (4 x rack)

On-line storage

TH-Net

(4CPU+2GPU)

From chips to Entire system

Page 33: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

TH-1A software stack

Page 34: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Sunway Bluelight MPP Designed by National Engineering Center for Parallel Computer

Developed for the National Supercomputing Center(Shandong), Jinan, China

8704 CPUs, 1.07 Petaflops peak performance

Linpack 795.9TFlops, 74.37%. 741.06MFlops/W

Infin QDR 40Gbps, Power Consumption 1.07MW, Water Cooling

Multi-core(16) Processor SW1600 designed by China

Released on HPC China 2011@Jinan

Page 35: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Sunway Bluelight Architecture

• SW1600 CPU:16 Cores/975~1100MHz/124.8~140.8Gflops

• Fat Tree,QDR 4X10Gbps Infiniband,MPI latency 2us:

• SWCC/C++/Fortran/ UPC/MPICC/Mathematical Library

• Storage:2PB,Peak I/O:200GB/s,IOR(~60GB/s)

Parameters:

Page 36: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Outline

• Background of China HPC TOP100 • Analysis of 2011 China HPC TOP100 • Overview of China 863 key project • Petascale Applications on TianHe-1A • Future HPC performance development

trends of China • Summary

Page 37: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Profile of user number

37%

20%

10%

8%

7%6% 2%2%

8%

Basic science research (Physics,Chemical, Astronomy, etc)Bio-medical research

New material, new energy research

Computing fluid dynamics

Engineering design, simulation andanalysisEnvironment science

Weather and climate forecasting

Petroleum exploration

Animation

Number of Users Profile on TH-1A

Page 38: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Profile of resource usage

24%

7%

7%

2%4%5%

8.2%

41.8%

Petroleum exploration

Bio-medical research

New material, new energy research

Environment science

Basic science research (Physics,Chemical, Astronomy, etc)Computing fluid dynamics

Weather and climate forecasting

Animation

Engineering design, simulation andanalysis

Resource Usage Profile on TH-1A

Page 39: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

• Joint work • Shanghai Astronomical Observatory, CAS (SHAO), • Institute of Software, CAS (ISCAS) • Shanghai Supercomputer Center (SSC)

• Building a high performance parallel computing software platform for astrophysics research, focusing on the planetary fluid dynamics(thermal convection in the Earth’s outer core) and N-body problems

• New parallel computing models and parallel algorithms studied, validated and adopted to achieve high performance.

Parallel Computing Software Platform for Astrophysics

Page 40: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Software Architecture

Physical and Mathematical

Model

Parallel Computing

Model

Numerical Methods

MPI OpenMP Fortran C

100T Supercomputer

PETSc Aztec

Software Platform for Astrophysics

Web Portal on CNGrid

Fluid Dynamics N-body Problem

Improved Preconditioner

Improved Lib. for Collective

Comunication SpMV

FFTW GSL

Lustre

Software Development

Data Processing Scientific Visualiztion

Page 41: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

• The early performace evaluation for Aztec code and PETSc code on Dawning 5000A is shown.

• For 80×80×50 mesh, the execution time of Aztec program is 4-7 times of the PETSc version, average 6 times;

• For 160×160×100 mesh, the execution time of Aztec program is 2-5 times of the PETSc version, average 4 times.

PETSc Optimized Version 1 (Speedup 4-6)

0

200

400

600

800

1000

1200

1400

1600

32 64 128 256 512 1024 2048

Runt

ime

(s)

Processor core

Mesh 160×160×100 (Dawning 5000A)

Aztec PETSc

0

50

100

150

200

250

300

350

400

16 32 64 128 256 512 1024 2048

Runt

ime

(s)

Processor core

Mesh 80×80×50 (Dawning 5000A)

Aztec Petsc

Page 42: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Method 1: Domain Decomposition Ordering Method for Field Coupling

Method 2: Preconditioner for Domain Decomposition Method

Method 3: PETSc Multi-physics Data Structure

PETSc Optimized Version 2 (Speedup 15-26)

Left: mesh 128 x 128 x 96 Right: mesh 192 x 192 x 128 Computation Speedup: 15-26

Strong scalability: Original code normal, New code ideal Test environment: BlueGene/L at NCAR (HPCA2009)

Page 43: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

2012/2/16

Strong Scalability on Dawning 5000A

0

200

400

600

800

1000

1200

1400

1600

32 64 128 256 512 1024 2048 4096 8192

Tim

e(Se

cond

s)

Processor Cores

Dawning 5000A(160×160×100 mesh size)

Aztec Petsc

Page 44: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

44

Strong Scalability rotm p linea r: 192x192x128

433.6

212.8

98.5

51.1

26.1

14.4

8.3

4.7

12.0

144.8

65.5

19.232.369.3157.7

257.1

13.523.8

344.7

1

10

100

1000

64 128 256 512 1024 2048 4096 8192

num b er of p rocessor core

Time(S)

BG /L

曙光5000A

深腾7000

Page 45: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

2012/2/16

Strong Scalability on TianHe-1A

Page 46: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

• A fully implicit shallow water atmospheric model(ISCAS) • Using 82,944 cores • Parallel efficiency 60% • #unknowns: 680M

• Petroleum seismic data processing(BGP) • GeoEast-lightning single/double-way wave

prestack depth migration software • using 85860 cores • 24.6TB data • 16hours

TianHe-1A Applications Case Study(CPU only)

Page 47: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

TianHe-1A Application Case study(CPU+GPU) • Direct Numerical Simulation of Turbulent Flow(PKU)

• GPU-accelerated FFT solver (PKUFFT) • Taylor micro-scale Reynolds number up to 1164 • Grid resolution up to 143363

• 7168nodes, >3.2million cuda cores(>100,000 gpu cores) • 30TFlops(SP) /17TFlops(DP) FFT sustained performance(SP)

Jaguar

PKUFFT(With GPU)

MKL(Without GPU)

Page 48: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

• High speed particle collision system simulation • Force calculation is accelerated by GPU • 21.9x speedup on a single GPU compared to a single CPU core • Excellent weak and strong scalability with up to 4096 nodes (106,496

cpu/gpu cores) for problems with up to 11.16 billion atoms • Embedded Atom Method potential. scale to the whole system is

expected

TianHe-1A Application Case study(CPU+GPU)

Page 49: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

• Trans-scale Simulation of Silicon Deposition Process(IPECAS) • Scalable bond-order potential (BOP) for the molecular dynamics

simulation of crystalline silicon • 26 nm × 54 nm × 1560000 nm (1.56mm), 110.1 Billions Atoms • Peak Perf. 7.38PFlops(SP),(7168 (Tesla M2050 + 2-way 5670 Xeon)) • 1.17Pflops in SP plus 92.1Tflops in DP on 7168 GPUs and 86,016 CPU

cores, 5TB Mem. • 1.87Pflops in (SP) on 7168 GPUs (25.3% Peak) • 758 flop per step per atom, 44.53s per 1000 steps run.

TianHe-1A Application Case study(CPU+GPU)

1.56 mm 0.54 nm

Page 50: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Outline

• Background of China HPC TOP100 • Analysis of 2011 China HPC TOP100 • Overview of China 863 key project • Petascale Applications on TianHe-1A • Future HPC performance development

trends of China • Summary

Page 51: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

• Ten public supercomputing centers • Beijing(CAS), Tianjin, Shandong, Shanghai, Shenzhen,

Chengdu, Hu’nan, Wuhan, Guangzhou, Chongqing • Covering the developed areas of China • Growing in industry design and simulation

• Five private centers in mature fields • Petroleum, Meteorology, Aerospace, Defense,

Energy • Related to country security

• Four centers in emerging areas • Cyberspace security, Internet service, Sensing

China, Triple-play

Demand for petascale HPCs will be growing in the next 5 years

Page 52: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Performance Development Trend of China TOP100 HPC

Performance Development Trend of China HPC(1993-2011)

110

1001000

100001000001E+061E+071E+081E+091E+101E+111E+121E+13

1993

1995

1997

1999

2001

2003

2005

2007

2009

2011

2013

2015

2017

2019

2021

2023

2025

Year

GFl

ops

No.1 Linpack

No.1 Peak

Total Perf.

Total Perf. Trends

No.1 Peak Trends

No.1 Linpack Trends

Page 53: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

趋势和展望 (1) Trend & Outlook (1)

• 1993-2011发展 China HPC performance increase • 1993年到1996年发展平稳 1993-1996 Slow steady • 1996年到1999年第一次跨越式发展 1996-1999 Big jump • 1999年到2001年平稳发展期 1999-2001 Slow steady • 2001年到2005年另外一次快速发展时期 2001-2005 Another

period of big increase • 2005年到2007年重新进入平稳发展期 2005-2007 Slow steady

again • 2008年到2010年开始进入另外一个活跃发展周期,大约会持续2

到3年 After 2008, dramatic increase in the next 2-3 years • 2011年,开始进入一个平稳发展期,大约持续2到3年。 Slow

steady again in the next 2-3 years

Page 54: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

趋势和展望 (2) Trend & Outlook (2)

过去的预测和(实际情况) Previous Predictions • 峰值100TFflops的机器在2007年到2008年间出现(2008年10月)

2007-2008: System with peak performance of 100TFlops (Reality: Oct 2008)

• 累计Linpack性能将在2008年到2009年间超过PFlops(2008年10月) 2008-2009: Total Linpack performance exceeds Pflops (Reality: Oct 2008)

• 峰值PFlops的机器将在2010年到2011年间出现(提前完成!) 2010-2011: System with peak performance of 1PFlops (Reality: Oct 2009)

• 累计Linpack性能将在2011年到2012年间达到10PFlops 2011-2012: Total Linpack performance reaches 10PFlops(Reality: Oct 2011)

Page 55: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

趋势和展望 (3) Trend & Outlook (3)

未来的预测 Future Predictions • 峰值10PFlops的机器将在2012年到2013年间出现

2012-2013: System with peak performance of 10 PFlops • 峰值100PFlops的机器将在2014年到2015年间出现

2014-2015: System with peak performance of 100 PFlops • 累计Linpack性能将在2013年到2014年间达到100PFlops

2013-2014: Total Linpack performance reaches 100 PFlops

Page 56: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Outline

• Background of China HPC TOP100 • Analysis of 2011 China HPC TOP100 • Overview of China 863 key project • Petascale Applications on TianHe-1A • Future HPC performance development

trends of China • Summary

Page 57: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

• With correct strategies, China wins the HPC Olympic Games on 2011, and HPC is really helping science and economy development of China.

• HPC real application still lag behind the US, Euro and Japan.

• On TianHe-1A, several applications can scale up to 80,000 cpu cores.

• The growth rate of China HPC Perf. Is the fastest. • There will be at least 19 major petaflops

supercomputing centers within 5 years.

Summary

Page 58: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

• First Petaflops supercomputer totaly powered by domestic processor designed by China has been released on HPC China 2011@JiNan;

• According to TOP100 predictions, 10 Petaflops peak performace supercomputer will appear before 2013;

• According to TOP100 predictions, 100 Petaflops peak performance supercomputer may appear before 2015;

Summary

Page 59: State-of-the-Art Analysis and Perspectives of China HPC ...84.34% 165512 国防科大NUDT 2 2% 3337.70 6044.20 56.00% 256000 中科院过程所IPE 1 1% 496.50 ... 2011 China HPC TOP100

Thank You

• Thanks Yutong Lu, Chao Yang, We Ge • Contact: Yunchuan Zhang, Ph.D. • Emails: [email protected]

[email protected]