HPC https://www.hpci-office.jp/pages/seminar_text
HPC
8
CPU
CPU
OpenMP
MPI
CPUCPU
CPUCPU
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
Core
処理
処理
処理
処理
処理 処理 処理 処理
8
MPI(Message Passing Interface)OpenMP
(cont.)9
OpenMP MPI
parallel
(parallel region)
Hello OpenMP world
( )
(*) intel fortran: ifort –openmp hello.f GNU: gfortran –fopenmp hello.f login-node: frtpx –Kopenmp hello.f (**) csh : setenv OMP_NUM_THREADS 4
OpenMP
OpenMP
=
join
fork
!$omp parallel
!$omp end parallel
Hello OpenMP world
Write
Write
Write
Write
Write
Hello OpenMP
world
Hello OpenMP
world
Hello OpenMP
world
Hello OpenMP
world
Hello world
write
Parallel
Parallel OpenMP
do
single
sections
workshare (Fortran ) fortran90
DO
n=4000
n=40001 2 3
1000 2000 3000 4000
shared private
23
Shared Private
OpenMP
private private shared
V(:), X(:), Y(:)i i
Shared Private
OpenMP
shared
V(:), X(:), Y(:)
private
iprivate
i
parallel do
shared parallel do do
private
private(i) shared(V, X, Y)
Shared
Shared
shared shared
Private
Private
Private
i0 i
Fortran
Private ( i )
Parallel DO
DO private
Shared ( X(:) ) default
default(shared), default(private), default(none)
none
call y call X(i)
Shared COMMON/SAVE (n, ymax, a) module save shared
Private
PRIVATE/SHARED
OpenMP
OpenMP
exit goto continue
“ ”
reduction
Reduction
Reduction
Reduction r = r op expr r: reduction , expr: r
Reduction reduction private
!$omp parallel do reduction(op : r1 [ , r2] )
op : reduction (+ , * , - , .and. , .or. , .eqv. , .neqv. , max, min, iand, ior, ieor)
r1 [ , r2] : reduction
reduction
Reduction (cont.)
Reducion
Reduction private
0 .neqv. .false.
1 max
0 min
.and. .true. iand 1
.or. .false. Ior 0
.eqv. .true. ieor 0
(cont.)
!$omp parallel do
(load balancing)
idle idle idleiiiiddddddll iiiiiiiiidddddddddlleeeeeeeee
critical
critical
shared private/reduction
atomic critical
flush
ordered
OpenMP API
include ‘omp_lib.h’ Fortran use omp_lib Fortran90
omp_get_thread_num integer
0 [ -1]
omp_get_max_threads integer
omp_get_num_threads integer
omp_in_parallel logical.true. .false.
omp_get_wtime real*8 wall clock
!$ OpenMP OpenMP
”#ifdef _OPENMP”
omp_get_max_threads OpenMP OpenMP
”!$” OpenMP
OpenMP
i f(i)
( – )
OpenMP
1
2
3
5
6
f21= -f12
4
f21 fff12 f16
f13f15
-f15
-f16
-f13
f14
-f14
1
OpenMP
n = 6
i=1 f
OpenMP
f(4) f(6) f f
f reduction
f
OpenMP
24: 6 18:12
=30/24 = 1.25 =30/18= 1.67
OpenMP
PRIVATE REDUCTION
(1-p)
∞
1-p
p
1/(1-p)
80% ( p = 0.8 ) 1/0.2 =
1 2 4 81
2
4
8
(cont.)
(N)
1-p = 1/2
1-p = 1/4
1-p = 0 (1-p)
1-p = 1/8
s = 1
1− p( ) + p / Np: N:
100%
(88%)
(75%)
(50%)
2.3 2.9
4.3
1 10 100 1000 104 1051
10
100
1000
104
105
(cont.)
(N)
1-p = 1/10
1-p = 1/100
1-p = 1/1000
1-p = 1/10000
1-p = 0
(1-p)
s = 1
1− p( ) + p / Np: N:
100%
99.99%
99.9%
99%
90%
1-p = 1/10000099.999%
OpenMP http://openmp.org/wp/openmp-specifications/
version 3.0 OpenMP v3.0 gcc 4.4 v3.0 gcc 4.7 v3.1 intel 11.0 : v3.0 12.1 : v3.1
https://computing.llnl.gov/tutorials/openMP/
Fortran/C
Using OpenMP B. Chapman The MIT Press Fortran
OpenMP C/C++
C/C++ OpenMP OpenMP CPU
https://www.hpci-office.jp/pages/seminar_text
1 OpenMP並列化の例 2 アクセス競合の例
デモ 85
RIST FX10
OpenMP
x(80000)
(A) (B) (C)
OpenMP
OpenMP
(A) vi
(B) OpenMP vi (A)
OpenMP
OpenMP
(C) vi (B)
OpenMP SCHEDULE
OpenMP
(A) (B) (C)
(a)OpenMP (b) b0, b1
temp private
temp
do private
Sections
Sections
Single Program, Multiple Data streams (SPMD)
FirstPrivate/LastPrivate Threadprivate
92
95
COMMON SAVE Threadprivate
Threadprivate
common SAVE module
(=shared ×)
Threadprivate (cont.)
Threadprivate common common subroutine
private
common subroutine
common equivalence threadprivate
common
Copyin copyprivate
Copyin threadprivate
threadprivateCopyin
CopyprivateSingle
Single Single( )
ThreadPrivate
Threadprivate id routine_A, routine_B
id
common threadprivate
routine_A routine_B
COMMON SAVE
-save READ WRITE I/O Fortran
write
critical