2008 年高等学校计算机专业教学改革高级研修班( 5 月 17-18 日,北京)...
-
date post
21-Dec-2015 -
Category
Documents
-
view
280 -
download
0
Transcript of 2008 年高等学校计算机专业教学改革高级研修班( 5 月 17-18 日,北京)...
2008 5 17-18年高等学校计算机专业教学改革高级研修班( 月 日,北京)
《高性能计算导论》双语教学的实践
王小鸽王小鸽计算机科学与技术系计算机科学与技术系
国家实验室公共平台部国家实验室公共平台部
2008 5 17-18年高等学校计算机专业教学改革高级研修班( 月 日,北京) 2
提纲
总体情况介绍课件实例介绍经验与体会展望未来
2008 5 17-18年高等学校计算机专业教学改革高级研修班( 月 日,北京) 3
总体情况介绍
《高性能计算导论》课程三个发展阶段:专业课英语教学(必修课, 1998-2002 )
定位:取代《专业英语》特点:以高性能计算为主线组织教学活动,使用英语为主要目标。
专业课英语教学(选修课, 2003-2006 )定位:讲授专业知识为目标,采用英语为手段。特点:专业知识与英语训练并重。
专业课双语教学(选修课, 2007- )定位:讲授专业知识为目标,采用双语为手段。特点:专业知识与英语训练并重。更关注了专业知识的教学效果。
2008 5 17-18年高等学校计算机专业教学改革高级研修班( 月 日,北京) 4
总体情况介绍
课程的组织情况介绍教材 :原版教材
教材而非专著国内有影印版
课时:32 学时 /学期
形式:讲授 +讨论 +实验 +作业 +考试
2008 5 17-18年高等学校计算机专业教学改革高级研修班( 月 日,北京) 5
总体情况介绍
双语要求:对教师
预告授课: 英语 + 适当的中文解释出题:英语
对学生预习课上发言:英语 + 适当的中文解释作业:鼓励用英语(有加分)考试
开卷,允许带字典和笔记英文 + 少量的中文注释
2008 5 17-18年高等学校计算机专业教学改革高级研修班( 月 日,北京) 6
课件实例介绍
课程简介 讲义实例 作业实例 考试题及答卷实例
2008 5 17-18年高等学校计算机专业教学改革高级研修班( 月 日,北京) 7
课件实例介绍
课程简介 讲义实例 作业实例 考试题及答卷实例
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
8
Introduction to High Performance Computing
Xiaoge Wang
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
9
Course Syllabus
• Text book: [1] Ian Forster, “Designing and Building Parallel Programs” (人民邮电
出版社,英文版)
• References:[1] Ananth Grama, Anshul Gupta, George Karypis, Vipin Kumar, “Introd
uction to Parallel Computing” (机械工业出版社,中、英)[2] Timothy G. Mattson, Beverly A. Sanders, Berna L. Massingill, “Patter
ns for Parallel Programming” (清华大学出版社,翻译版)[3] Michael Quinn, “Parallel Programming in C with MPI and OpenMP”
(清华大学出版社,影印版)
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
10
Course Syllabus• Objectives:
– Answer the questions:• What is HPC? Why HPC?
• How to do HPC?
– Learn some basic concepts, algorithms and tools of HPC.
– Improve English skill. Instruction, discussion, homework, presentation
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
11
HPC
concepts
toolsalgo.
MPI, OpenMP, HPF
Linear algebraSearchSort
Task partition, SchedulingPerformance Model
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
12
Course Syllabus
• Grade Policy: – Homework 45% – Classroom Performance 15% – Final exam 40%– No tolerance to cheating
• Office Hour (English corner): – Tuesday. 8-9pm, FIT Building, room 3-412
2008 5 17-18年高等学校计算机专业教学改革高级研修班( 月 日,北京) 13
课件实例介绍
课程简介 讲义实例 作业实例 考试题及答卷实例
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
14
Lesson One: Introduction
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
15
Introduction
• What is HPC?
• Current development of HPC
• Overview of concepts
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
16
What is HPC?
• Definition
• Components
• Applications
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
17
What is HPC? -- DefinitionsDefinitions of High Performance Computing on the Web:
– A branch of computer science that concentrates on developing supercomputers and software to run on supercomputers. A main area of this discipline is developing parallel processing algorithms and software: programs that can be divided into little pieces so that each piece can be executed simultaneously by separate processors.www.angelfire.com/anime3/internet/programming.htm
– The field of high performance computing (HPC) comprises computing applications on (parallel) supercomputers and computer clusters. Most ideas for the new wave of grid computing were originally borrowed from HPC. en.wikipedia.org/wiki/High_Performance_Computing
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
18
What is HPC? --Components
• Hardware: – Supercomputer, Cluster, switch, network
• Software: – OS, Shared/distributed memory management,
file systems, parallel programming tools
• Algorithm: – Parallel/distributed algorithm design
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
19
What is HPC? --Applications
• Modern science and engineering– Grand challenges: quantum chemistry,
cosmology, astrophysics, CFD, material design, biology, genome sequencing, global weather and environmental modeling,……
• Information Technology– Web services, data mining, search engine,
information retrieval,……
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
20
Current Development of HPC
• Trends in computer design
• Trends in networking
• Trends in software design
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
21
Current Development of HPC --Trends in Computer Design
• High Performance is still an important goal.
• Multicore technology is maturing
• Multiprocessor is still the main architecture.
• Multicomputer is becoming the foundation of the large scale Cyber-Infrastructure (CI).
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
22
From http://www.top500.org/
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
23
Earth Simulator
• Based on the NEC SX architecture, 640 nodes, each node with 8 vector processors (8 Gflop/s peak per processor), 2 ns cycle time, 16GB shared memory. Total of 5120 total processors, 40 TFlop/s peak, and 10 TB memory. • It has a single stage crossbar (1800 miles of cabl
e) 83,000 copper cables, 16 GB/s cross section bandwidth.
• 700 TB disk space • 1.6 PB mass store • Area of computer = 4 tennis courts, 3 floors
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
24
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
25
BlueGene/L
• Site: DOE/NNSA/LLNL• System Model: eServer Blue Gene Solution• Vendor: IBM• Application area: Research• Main Memory: 32768 GB• Installation Year: 2005 • Operating System: CNK/SLES 9• Interconnect: Proprietary• Processor: PowerPC 440 700 MHz (2.8 GFlops)
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
26
BlueGene/L• BlueGene/L boasts a peak speed of over 360 teraFLOPS, a
total memory of 32 tebibytes, total power of 1.5 megawatts, and machine floor space of 2,500 square feet. The full system has 65,536 dual-processor compute nodes. Multiple communications networks enable extreme application scaling:
• Nodes are configured as a 32 x 32 x 64 3D torus; each node is connected in six different directions for nearest-neighbor communications
• A global reduction tree supports fast global operations such as global max/sum in a few microseconds over 65,536 nodes
• Multiple global barrier and interrupt networks allow fast synchronization of tasks across the entire machine within a few microseconds
• 1,024 gigabit-per-second links to a global parallel file system to support fast input/output to disk
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
27
BlueGene/L by IBM
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
28
HPC in ChinaAccording to statistic of HPC Top500 (June, 2007):
Countries Count Share % Rmax Sum (GF) Rpeak Sum (GF)
Processor Sum
US 281 56.20 % 3079489 4444621 816680China
(mainland)
13 2.60% 96403 174954 27660
The top one installed in China is listed as the 43th of Top500 (IBM).See also: http://www.samss.org.cn/ for China Top 100 super-computer list.
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
29
HPC Facilities in Tsinghua
• TH-Discovery 3– Architecture:
• Cluster with 128 nodes, 256CPU, 1.3TFLOP/sec peak performance.• Node: HP Server rx2600, 4DB PC2100 DDR-SDRAM memory qu
ad (4x1GB DIMMs);• Storage: 200TB
– Software:• Redhat Linux As3.0 ia64, kernel 2.4.21 20.EL• LSF Job Management System. • MPI for parallel programming• Mathematical libraries • ChinaGrid Monitor
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
30
Current Development of HPC -- Networking
• High speed inter-connections– Proprietary (IBM, Cray)– Commercial products: InfiniBand, Ethernet, M
yrinet, Quadrics…
• Internet – Internet usage in China
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
31
Internet Users in ChinaData source: 《中国互联网络发展状况统计报告( 2007 。 1 )》
annual report of China’s internet development status
Comparing to other two sets of data (from CIA the world factbook):
World: 1,018,057,389 / 6,602,224,175 = 15.4%
US: 208,824,428 / 301,139,947 = 69.3%
China: 137,000,000 / 1,321,851,888 = 10.4%
数据来源:中国互联网络信息中心(CNNIC)
1370011100
9400
7950
5910
3370
2250
0
2000
4000
6000
8000
10000
12000
14000
2000.12 2001.12 2002.12 2003.12 2004.12 2005.12 2006.12
万人
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
32
Internet Users in China
Facts:• The rate of increase in #users is slowed down.• Internet users is only 10.4% of total population (8.5% previous year)
数据来源:中国互联网络信息中心(CNNIC)
75.4%
34.5%
23.4%
18.1%18.2%
49.8%
0%
20%
40%
60%
80%
2001.12 2002.12 2003.12 2004.12 2005.12 2006.12
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
33
Internet Machines
•*Internet hosts: China(43rd): 232,780 (2006) vs. US(1st): 195,139,000 (2005) Source: http://www.cia.gov/cia/publications/factbook/fields/2184.html
数据来源:中国互联网络信息中心(CNNIC)
1254
5940
4950
4160
3089
2083
8920
1000
2000
3000
4000
5000
6000
2000.12 2001.12 2002.12 2003.12 2004.12 2005.12 2006.12
万台
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
34
Internet Machines
数据来源:中国互联网络信息中心(CNNIC)
-20%
0%
20%
40%
60%
80%
2001.12 2002.12 2003.12 2004.12 2005.12 2006.12
上网计算机总数 专线上网计算机数 拨号上网计算机数
Facts:Grow rate is increased slightly. The dial-in and special connection is decrease, while the broad band
connection increase.
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
35
Bandwidth for International Links
•Total bandwidth going international reached 256,696 Mbps,
increase by 120,590 Mbps.•The growth rate is at 88.6%.
数据来源:中国互联网络信息中心(CNNIC)
2799 7598 9380
136106
256696
74429
27216
0
50000
100000
150000
200000
250000
300000
2000.12 2001.12 2002.12 2003.12 2004.12 2005.12 2006.12
M
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
36
Current Development of HPC –Grid Applications
• Image processing Grid– Medical image for diagnosis– Remote sensing image processing and application– Digital human
• Bio-Informatics Grid– Resources (computation power) sharing
• Online-Courses Grid– Online courseware sharing– Online course broadcast
• Computational Fluid Dynamic Grid– Software simulation tools sharing
• Information Processing Grid– Digital Museum
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
37
History of Computing in Tsinghua
• First computer degree program: 1956
• Establishment of the Department of Computer Science and Technology :1978
• Establishment of the Computing Center: 1975
– Single user computer: DJS130,
– Imported mainframes: Honeywell, Fujitsu, IBM
– *PC Labs and Campus information systems.
• Establishment of Common Platform Division in TNLIST: 2004
– TH-Discovery3 (2005)
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
38
Computer Systems ResearchComputers made in Tsinghua:
– 1959-1964: J-911, vacuum tube – 1960: Analog computer– 1966: J-112, transistor: – 1972-1974: J-724, a real-time computer.– 1974: DJS-100 Series, integrated circuit – 1987: THUDS, concurrent computer, transputer– 1993: RISC processor– 1998: Linux Cluster , Peak 32Gflops– 2003: TH-MANS, a massive storage networked system.– 2005: TH-Discovery3, 25th in China Top100.
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
39
HPC Activities Computational Science and Engineering Research (23 ongoing project
s by Jan. 2006)– Recombination rate estimation and hotspot detection in the human genome– The molecular evolution of microRNAs– Microstructures and Thermo-physical Properties of Alloy Melts and Their
Effects on Solidification Structures– Parallel Computing of Fast Multipole Boundary Element Method– Investigation on the integral equation method for the numerical computatio
n of electromagnetic fields– DNS of multiphase flow with mesh-less method– Efficient sub-graph mining algorithm and its applications– Theoretical study of the catalytic dissociation of hydrogen on Ni-Fe alloy s
urfaces– Investigation on unfolding dynamics of the smallest protein– Pattern recognition and molecular validation on alternatively spliced genes– Computation optimization of thermo-acoustic engine
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
40
Overview of Concepts
Parallel Machine Models Parallel Programming models Parallel Algorithm examples
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
41
Parallel Machine Models:
The requirements • General: Allow the study of algorithm and
programming language to be independent from the improvement of architecture.
• Simple: To facilitate understanding and programming
• Realistic: To ensure that programs developed for the model execute with reasonable efficiency on real computer.
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
42
The Von Neumann Computer:
• A central processing unit (CPU)
• A storage unit (memory)
• A control unit
• I/O unit
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
43
The Multiplicity --From Von Neumann Machine to Modern Parallel Machines
• Multiple computers:
• Multiple CPU:
• Multiple function units
• Multiple instruction execution:
• Multiple levels of cache
• Multiple ……
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
44
Flynn’s Taxonomy
SISD:
uniprocessor
SIMD
Processor array
Pipelined vector processors
MISD
Systolic array
MIMD
Multiprocessors
Multicomputers
Data streamsingle multiple
Instruction stream
single
multipl
e
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
45
Parallel Programming Models
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
46
Additional Properties of Parallel Software:
• Concurrency: each node execute its own program.
• Scalability: the number of nodes could vary.
• Locality: the cost of accesses to local memory is less than the cost of accesses to remote memory.
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
47
Parallel Program Requirements:
A good parallel program has:– Concurrency: Ability to perform many
actions simultaneously.– Locality: High ratio of local memory access
to remote memory access.– Scalability: Resilience to increasing processor
counts.
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
48
Example
Bridge construction: A bridge is to be assembled from girders being constructed at a foundry.
(a)
(b)
foundry bridge
foundry bridge
girders
girders
request
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
49
A Parallel Programming Model:
Tasks and channels:– One or more tasks which could execute concurrently.
– A task encapsulates a sequential program, local memory and interface to its environment (in-ports and out-ports).
– Four additional function of a task: send and receive messages, create new tasks and terminate.
– Channels: message queues connecting in-port/out-port pairs.
– The mapping (tasks to physical processors) does not affect the semantics of a program.
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
50
Other Models:
• Message-Passing: similar to the tasks and channels model.
• Shared-memory:
• Data parallel:A+B, 2*A, ...
• Other models:– PRAM, BPS, C3, logP, ...
foundry bridge
storage
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
51
Parallel Algorithm Examples
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
52
Scientific Computing:
1. Mathematical model of real world problems: PDE, ODE, etc.
2. Numerical solution of mathematical problems:
Discrete methods: finite difference, finite elements, etc.
Solving linear equations: Direct method or iterative method.
3. Implementation of numerical methods.
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
53
Finite Differences:
To solve the equation: f (x) = 0 ;Use finite difference method as:
f (x+h) = (f(x) – f(x+h) )/h+ O(h2)f (x+h) = f (x) – f (x+h))/h + O(h2)f (x) = {[f(x-h) – f(x)] – [f(x)-f(x+h)]}/h2 +O(h2) = [f(x-h) – 2f(x) + f(x+h)]/h2 +O(h2);
Discretize:f(xi-1 ) – 2f(xi) + f(xi+1) = 0, i = 0, 1, …, n-1;
Use iterative method to solve the equations:
f(xi)(t+1) = [f(xi-1)(t)+ 2f(xi) (t) + f(xi+1) (t) ]/4, t=1, 2, …,T; i = 0, 1, …, n-1.
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
54
Finite Differences:• A vector X is used to contain N points of f(x) o
n the problem domain:
• Create N tasks for each point.
• Each task is given initial value f(xi)(0) and compute f(xi)(t), t= 1, 2, …,T– Sends its data f(xi)(t) on its left and right outports.
– Receives f(xi-1)(t) f(xi+1)(t) from its left and right inports, and
– Uses these values to compute f(xi)(t+1)
2
1 2 4 5 6 7 83
1 3 4 5 6 7 8
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
55
Pair-wise interactions:
The computation of all N(N-1) pair-wise interactions I(Xi, Xj), ij, between N data, X0, X1,…, Xn-1.
Parallel algorithm:1. Create N tasks
2. Task i is given Xi and responsible for computing interactions: I(Xi, Xj), ij
Q: How many communication channels are needed?
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
56
Pair-wise interactions:
Answer #1: N(N-1) channels. Task i sends Xi to its N-1 outports and receives Xj, ji from its N-1 inports.
0 1
2
3
45
6
7
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
57
Pair-wise interactions:
Answer #2: N channels. Each task sends the most recently received data to its outport. Repeat N-1 times.
0 1
2
3
45
6
7
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
58
Pair-wise interactions:Answer #3 (symmetry case) : N+N channels. Each task sends
the most recently received data the associated accumulator to its outport. Repeat (N-1)/2 times.
0 1
2
3
45
6
7
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
59
Search:procedure search(A)begin if (solution(A)) then score = eval(A) report solution and score else foreach child A(I) of A
search(A(I)) of A endfor endifend
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
60
Search:
• A single task is created for the root of the tree.
• Create a new task for each search call.• Create a channel for each new task to return
to its parent any solutions located in its sub-tree.
Q: Can the search be terminated completely when a solution is found?
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
61
Search:
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
62
Parameter study:
• A rang of different input parameters are read from an input file
• The same computation is performed using different input value.
• The results of different computations are written to an output file.
y=f(x)x1, x2, x3, … y1, y2, y3, …
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
63
Parameter study:
Case 1: The execution time per problem is constant and each processor has the same computation power.
y=f(x)y=f(x)
x1, x2, x3,x4, x5, x6, x7, x8
y=f(x) y=f(x)
y1, y2, y3, y4, y5, y6, y7, y8
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
64
Parameter study:
Case 2: The execution time per problem is not constant and/or each processor does not have the same computation power
I
O
W W W W
x1, x2, x3, x4, ……
y?
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
65
Parameter study:
• Non-deterministicQ: In what order are the computed results written?
Q: On which processor is y5=f(x5) computed?
• PrefetchingQ: A worker that has sent a request to the input tas
k has to wait for the parameter to arrive. Could the worker keep working while waiting for the response from input task?
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
66
Summary
• Overview of this course
• Overview of HPC development
• Overview of concepts– Machine model– Program model– Algorithms
2008年高等学校计算机专业教5 17-18学改革高级研修班( 月
日,北京)
67
That’s all for today.
Next class: Programming with MPI
Thanks
Good Bye
2008 5 17-18年高等学校计算机专业教学改革高级研修班( 月 日,北京) 68
课件实例介绍
课程简介 讲义实例 作业实例 考试题及答卷实例
2008 5 17-18年高等学校计算机专业教学改革高级研修班( 月 日,北京) 69
课件实例介绍
课程简介 讲义实例 作业实例 考试题及答卷实例
2008 5 17-18年高等学校计算机专业教学改革高级研修班( 月 日,北京) 70
体会
( 1 ) “原汁原味”“公元五世纪的鸠摩罗什,是把佛经译为汉文的最大翻译家之一,他说,翻译工作恰如嚼饭喂人。一个人若不能自己嚼饭,就只好吃别人嚼过的饭。不过经过这么一嚼,饭的滋味、香味肯定比原来乏味多了。”
“ 一种翻译,终究不过是一种解释。”引自冯友兰先生著《中国哲学简史》
( 2)“熟能生巧”外语能力是“逼”出来的;外语水平是“练”出来的;外语潜力是“挖”出来的。
2008 5 17-18年高等学校计算机专业教学改革高级研修班( 月 日,北京) 71
展望未来
外部发展条件越来越好原版教材出版课件资源更加丰富英语的需求增加主管部门的鼓励政策
人员自身素质越来越高英语水平的普遍提高
2008 5 17-18年高等学校计算机专业教学改革高级研修班( 月 日,北京)
请指正。谢谢!