CONTENIDO - acarus.uson.mxacarus.uson.mx/CAR-2016/curso-car.pdf · • Facilita el estudio de...

CONTENIDO

S I S T E M A S D E C A R E N L A U N I S O N

• Conceptos básicos

• Estadísticas (top 500)

• Uso para la ciencia

• Supercómputo en México

• Supercómputo en la Unison

• Preguntas?

Qué es el Supercómputo?

• Es la tecnología informática más avanzada de

cálculo numérico.

• Permite al investigador llevar a cabo, con

certeza y velocidad, billones de operaciones de

punto flotante por segundo para estudiar

problemas de gran magnitud.

Nombre Flops

megaflops 106

gigaflops 109

teraflops 1012

petaflops 1015

exaflops 1018

zettaflops 1021

yottaflops 1024

FLOPS= FLoating point Operations Per Second


• Facilita el estudio de fenómenos y condiciones que tan sólo hace menos 30

años eran imposible.

• Sus aplicaciones abrieron en todo el mundo, nuevas líneas de investigación

científica en áreas como ingeniería, medicina, geofísica, geografía, astronomía,

química, ciencias de la atmósfera y ciencias nucleares, entre otras.

Qué es el Supercómputo?


Cómo explicar estos fenómenos?


Organización

TOP500

Noviembre

2015


RANK SITE SYSTEM CORES

RMAX

(TFLOP/S)

RPEAK

(TFLOP/S) POWER (KW)

1 National Super

Computer Center in

Guangzhou

China

Tianhe-2 (MilkyWay-2) - TH-IVB-

FEP Cluster, Intel Xeon E5-2692

12C 2.200GHz, TH Express-2,

Intel Xeon Phi 31S1P

NUDT

3,120,000 33,862.7 54,902.4 17,808

2 DOE/SC/Oak Ridge

National Laboratory

United States

Titan - Cray XK7 , Opteron 6274

16C 2.200GHz, Cray Gemini

interconnect, NVIDIA K20x

Cray Inc.

560,640 17,590.0 27,112.5 8,209

3 DOE/NNSA/LLNL

United States

Sequoia - BlueGene/Q, Power

BQC 16C 1.60 GHz, Custom

IBM

1,572,864 17,173.2 20,132.7 7,890

4 RIKEN Advanced

Institute for

Computational

Science (AICS)

Japan

K computer, SPARC64 VIIIfx

2.0GHz, Tofu interconnect

Fujitsu

705,024 10,510.0 11,280.4 12,660

5 DOE/SC/Argonne

National Laboratory

United States

Mira - BlueGene/Q, Power BQC

16C 1.60GHz, Custom

IBM

786,432 8,586.6 10,066.3 3,945

6 DOE/NNSA/LANL/SN

L

United States

Trinity - Cray XC40, Xeon E5-

2698v3 16C 2.3GHz, Aries

interconnect

Cray Inc.

301,056 8,100.9 11,078.9

7 Swiss National

Supercomputing

Centre (CSCS)

Switzerland

Piz Daint - Cray XC30, Xeon E5-

2670 8C 2.600GHz, Aries

interconnect , NVIDIA K20x

Cray Inc.

115,984 6,271.0 7,788.9 2,325

8 HLRS -

Höchstleistungsreche

nzentrum Stuttgart

Germany

Hazel Hen - Cray XC40, Xeon E5-

2680v3 12C 2.5GHz, Aries

interconnect

Cray Inc.

185,088 5,640.2 7,403.5

9 King Abdullah

University of Science

and Technology

Saudi Arabia

Shaheen II - Cray XC40, Xeon

E5-2698v3 16C 2.3GHz, Aries

interconnect

Cray Inc.

196,608 5,537.0 7,235.2 2,834

10 Texas Advanced

Computing

Center/Univ. of Texas

United States

Stampede - PowerEdge C8220,

Xeon E5-2680 8C 2.700GHz,

Infiniband FDR, Intel Xeon Phi

SE10P

Dell

462,462 5,168.1 8,520.1 4,510

http://www.top500.org/site/50365

http://www.top500.org/system/177999




















OrganizaciónTOP500

OrganizaciónTOP500

MPP: Massively Parallel Processing


OrganizaciónTOP500


OrganizaciónTOP500


#2

TITAN


CODE DESCRIPTION Example science problemProgramming modelfor acceleration Libraries Performance information POINT OF CONTACT

LAMMPS is a molecular dynamics general statistical mechanics based code applicable to bioenergy problems . http://lammps.sandia.gov/

Course-grained molecular dynamics simulation of bulkheterojunction polymerblend films used, e.g., withinorganic photovoltaic devices.

OpenCL or CUDA

Speedup is 1X to 7.4X on 900 nodes, comparing XK7 to XE6. The performance variation is strongly dependent upon the number of atoms per node. This algorithm is mixed precision on GPU, double precision on CPU.

Mike Brown, ORNL

WL-LSMS. Wang-Landau (WL) - Linear Scaling Multiple Scattering (LSMS). A first principles density functional theory code (local density approximation) used to study magnetic materials

Simulation of the magnetic phase transition in nickel.

CUDA or CUDA and Libraries

GPU: CULA, LibSciACC, cuBLAS CPU: BLAS, LAPACK

XK7 vs XE6 speedup is 3.5X. Benchmark runs from 321 (321 WL walkers, 1024 atoms.)

Markus Eisenbach, ORNL

S3D. Direct numerical simulation of compressible, reacting flows for combustion science

Temporal jet simulation of dimethyl-ether combustion

OpenACC XK7 vs XE6 speedup is 2X.Ramanan Sankaran, ORNL

CAM-SE. Community Atmosphere Model - Spectral Elements. http://earthsystemcog.org/projects/dcmip-2012/cam-se

High-resolution atmospheric climate simulation using CAM5 physics and the MOZART chemistry package.

CUDA Fortran Matt Norman, ORNL

DENOVO is a three-dimensional, massively parallel, deterministic radiation transport code. It is capable of solving both shielding and criticality problems on high-performance computing platforms.

Reactor eigenvalue problem CUDA

XK7 CPU-only vs. XK7 (CPU+GPU) for the Denovo Sweep part only, on nearly 18K nodes.

Tom Evans (ORNL), Wayne Joubert(ORNL)

TITAN


National Center for Computational Sciences (NCCS)

Laboratorio Nacional Oak Ridge USA

Jaguar


Jaguar

• 18 mil 688 nodos de cálculo dual hex-core AMD Opteron, 224 mil 256

núcleos.

• 300 TB de memoria RAM.

• 3. 2 mil 300 billones de operaciones de punto flotante/s (2.3 petaflop/s)


High-Fidelity Simulations for Clean and Efficient Combustion of Alternative Fuels.

Jacqueline Chen, Sandia National Laboratories: 30,000,000 hours

Clean and Efficient Coal Gasifier Designs using Large-Scale Simulations. Madhava

Syamlal, National Energy Technology Laboratory: 13,000,000 hours

Landmark Direct Numerical Simulations of Separation and Transition for Aerospace-

Relevant Wall-Bounded Shear Flows. Hermann Fasel, University of Arizona: 500,000

hours

Petascale Simulation of Nan-Electronic Devices. Gerhard Klimeck, Purdue University:

5,000,000 hours.

Propulsor Analyses for a Greener, High Bypass Ratio, Aircraft Gas Turbine Engine. Robert

Maleki, Pratt & Whitney: 1,500,000 hours.

Engineering Projects

Jaguar


En el caso particular de la

astrofísica un grupo de

investigadores del ORNL dirigidos

por Anthony Mezzacappa

desarrollan el primer modelo

tridimensional (3D) para estudiar

con detalle la explosión de

supernova producida por el

colapso del núcleo de una estrella

masiva poniendo énfasis en el

caso particular de la Supernova

1987.

Jaguar


Jaguar

Proyecto Chimera: código

hidrodinámico

(MVH3/VH1); Código de

transporte de neutrinos

(MGFLDTRANS);

Código cinético nuclear

(XNET)


Jaguar

El proyecto Chimera solicitó 60 millones de horas

de procesador, es decir, más de 6 mil 800 años de

tiempo de CPU. En otras palabras, una

computadora con un procesador de un núcleo

necesitaría casi 7 mil años para consumir este

tiempo de procesamiento.

1,000 núcleos: 7 años

10,000 núcleos: 9 meses

Cómputo Paralelo:


High Performance Computing at Los

Alamos National Laboratory - Cray

ExaScale

Nombre Flops

megaflops 106

gigaflops 109

teraflops 1012

petaflops 1015

exaflops 1018

zettaflops 1021

yottaflops 1024

Infraestructuras de CAR en México


Infraestructuras de CAR en México

Posición Institución Equipo No. cores TFLOPS

1 CINVESTAV ABACUS 8820 277.5

2 IPICYT-CNS Thubat-Kaal - IBM 2640 107.3

3 UAM-Iztapalapa - LSV YOLTLA 2160 43.2

4 CINVESTAV XIUHCOATL 19608 24.9

5 UNAM-DGTIC MIXTLI - HP 1024 21.3

6 UAM-Iztapalapa - LSV Cluster Aitzaloa 2160 18.4

7 UNISON-ACARUS Ocotillo - Dell 512 14.9

8 UNAM-DGTIC KANBALAM - HP 1368 7.1

9 IPICYT-CNS Cluster IBM E-1350 1350 4.7

10 INMENGEN IBM 960 2


TOP500, #500 = 164.8:06/2015 y 206.4:11/2015

EL ACARUS

El Area de Cómputo de Alto Rendimiento de la

Universidad de Sonora, se creó en el 2001 con la finalidad

de apoyar las actividades de los cuerpos académicos. La

importancia de esta área ha radicado sobre dos ejes:

Contar con equipamiento que permita realizar

investigación de frontera.

Contar con los programas de cómputo científico que

son considerado en el medio como estándares.


SERVICIOS / FUNCIONES

Proporcionar una infraestructura de supercómputo a los usuarios que los requieran

Impulsar el desarrollo tecnológicos de alto rendimiento

ProDeTAR

Actualización de Infraestructura

Licenciamiento de Software Científico

Promover la utilización del ACARUS

Administrar el hardware y software

Brindar servicio de información y asesoría a los usuarios de cómputo de alto rendimiento


Realizar la planeación y organización de cursos de capacitación

ProCCAR

Programa de capacitación continua

Diplomado en Supercómputo

Proyectos de servicio social

Ofrecer soporte técnico especializado

Atender a visitas técnicas y académicas

Diseñar y mantener la página de Internet del ACARUS

Mantener lazos de colaboración interinstitucional

SERVICIOS / FUNCIONES


ACTIVIDADES ACADEMICAS


COLABORACIONES

AMCAV


Red Mexicana de Supercómputo

INFRAESTRUCTURA DISPONIBLE

• Cluster Científico de 512 cores CPU, 8 GPUs: OCOTILLO

• Cluster Científico de 72 nodos duales: MEZQUITE

• Cluster Experimental de 16 nodos: CHOYA

• PC’s para capacitación

• Equipos de Video-Conferencia

• Equipo de Proyección

• Unidades de Almacenamiento Externo

• Equipos Perifericos


SALA DE CAPACITACION


INFRAESTRUCTURA-SOFTWARE

Todos ellos considerados de lo mejor disponible en su área y de gran prestigio tanto en la academia como en oficinas de gobierno y empresas en el mundo.

• SO: Unix y Linux• Simulación e imagenes: IDL y Matlab• Lenguaje Simbólico: Mathematica• Procesamiento Intensivo: Fortran, C y Gaussian98• SIG : Idrisi, Cartalinx y Arcinfo• Estadística: SAS y EQS• Librerías: BLAS, LAPACK y ATLAS


OCOTILLO

Proyecto:

Actualización de la infraestructura

de cómputo de alto rendimiento

de la Universidad de Sonora

Programa:

Apoyo al Fortalecimiento y

Desarrollo de la Infraestructura

Científica y Tecnológica del

CONACYT


RETO: IMPLEMENTACIÓN DE CLÚSTER DE ALTO

RENDIMIENTO PARA PRODUCCIÓN CIENTÍFICA

H P C U N I S O N


SOLUCIÓN INTEGRAL

• 1 NODO MAESTRO

• 8 NODOS DE CALCULO CPU

• 2 NODOS DE VISUALIZACION CIENTIFICA

• 1 NODO CPU/GPU

• 1 SISTEMA DE ALMACENAMIENTO 50TB

• RED INFINIBAND QDR

• RED DE ADMINISTRACION ETHERNET GIGABIT

• 1 SISTEMA DE MONITOREO KVM

• UPS

• RACK


4 X

• 8 AMD OPTERON 6282SE, 2.6 GHZ

= 128 CORES

• 256 GB RAM

• 8 TB 7.2K RPM

• SAS 6GBPS

• 64 cores x 8 servidores x 4 flops x 2.6 GHz = 5,324.8 GFlops

H P C U N I S O N

NODOS: MAESTRO Y DE CÁLCULO CPU

• 2 INTEL XEON E5680, 3.3 GHZ

= 12 CORES

• 24 GB RAM

• 1.5 TB 15K RPM

• SCSI 6GBPS

SWITCH INFINIBAND 40 GBPS

SWITCH ETHERNET 10 GBPS

Nodo Maestro

Nodos de procesamiento CPU


H P C U N I S O N

NODOS DE VISUALIZACIÓN CIENTÍFICA

• 2 INTEL XEON E5680, 3.3 GHZ

= 12 CORES

• 24 GB RAM

• 1.5 TB 15K RPM

• SCSI 6GBPS

SWITCH ETHERNET 10 GBPS

2 X

• 1 XEON E5620, 2.4 GHZ

= 4 CORES

• 128 GB RAM

• NVIDIA QUADRO 5000, 2.5 GB RAM

= 352 CORES

• 600 GB 10K RPM

• SCSI 6GBPS

• 718 GFLOPS / GPU

Nodo MaestroNodo de Visualización


H P C U N I S O N

NODO GP/GPU

• 8 NVIDIA TESLA M2070Q, 1.55 GHZ, 448 CORES

= 3584 CORES

• 6 GB RAM DEDICADA / GPU

• 8 GPGPUs x 1,024 GFLOPS = 8,192 GFLOPS

Nodos de procesamiento CPUNodos de procesamiento GPU


ALMACENAMIENTO PARALELO

2 X

• 2 XEON E5620, 2.4 GHZ

= 8 CORES

• 48 GB RAM

• 600 GB 15K RPM

• SCSI 6GBPS

2 X

• 24 TB 7.2K RPM

• SAS 6GBPS

• 24 TB 7.2K RPM

• SAS 6GBPS

Nodos de control

Arreglos de discos


UPS, CABLEADO Y RACK

• 18 KVA

• BANCO DE BATERIAS2X

• 42 U

• 16 PUERTOS

• CONSOLA


RESULTADO DE LA IMPLEMENTACION


SOFTWARE INSTALADO

Administración:

Sistema Operativo Linux CentOS 6.2

Torque 4.0.2 (Resource Manager)

Maui 3.3.1 (Cluster Scheduler)

Ganglia 3.5.2 (Monitoring System)

Librerías de paso de mensajes:

MVAPICH 1.9a2 (MPI over InfiniBand)

Open MPI 1.4.5 (Open Source High Performance Message Passing

Library)

MPICH2 1.5 (Message Passing Interface)


SOFTWARE INSTALADO

Compiladores y librerías matemáticas:

Open64 5.0 (C, Fortran Compilers)

ACML 5.1.0 (AMD Core Math Library)

CUDA 5 (Compute Unified Device Architecture)

FFTW 3.3.1 (Fast Fourier Transform in the West)

GSL 1.9 (GNU Scientific Library)

Aplicaciones:

NWChem 6.1.1 (High-Performance Computational Chemistry Software)

GROMACS 4.5.5 (Groningen Machine for Chemical Simulations)

NAMD 2.8 (Nanoscale Molecular Dynamics)

GAMESS (General Atomic and Molecular Electronic Structure System)


EQUIPO DE APOYO: LABORATORIO DE SUPERCOMPUTO

Y VISUALIZACIÓN DE LA UAM-I, GRACIAS!!!


EQUIPO DE ADMINISTRACION IDEAL

• 1 ADMINISTRADOR DE PROYECTO

• 1 ADMINISTRADOR TECNICO

• 2 ADMINISTRADOR DE SOLUCIONES

• SOPORTE TECNICO:

• 1 CALCULO CPU

• 1 CALCULO GPU

• 1 VISUALIZACION CIENTIFICA

• 1 LUSTRE

• 1 RED INFINIBAND/ETHERNET



PREGUNTAS?

[email protected]

[email protected]

www.acarus.uson.mx

CONTENIDO - acarus.uson.mxacarus.uson.mx/CAR-2016/curso-car.pdf · • Facilita el estudio de...

Documents

Transcript of CONTENIDO - acarus.uson.mxacarus.uson.mx/CAR-2016/curso-car.pdf · • Facilita el estudio de...