Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

57
Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff http://www.ccp.ac.uk/ http://www.hpcx.ac.uk/

Transcript of Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Page 1: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Opportunities for Biological Consortia on HPCx

Code Capabilities and Performance

HPCx and CCP Staff

http://www.ccp.ac.uk/

http://www.hpcx.ac.uk/

Page 2: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 20032HPCx/Biology Discussions

Welcome to the Meeting

• Background– HPCx

• Objectives– to consider whether there is a case to bid

• Agenda– Introduction to the HPCx service– Overview of Code Performance– Contributed Presentations– Invited Presentation - – Discussion

Page 3: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 20033HPCx/Biology Discussions

Outline

• Overview of Code Capabilities and Performance

– Macromolecular simulation• DL_POLY, AMBER, CHARMM, NAMD

– Localised basis molecular codes• Gaussian, GAMESS-UK, NWChem

– Local basis periodic code• CRYSTAL

– Plane wave periodic codes• CASTEP

• CPMD (Alessandro Curioni talk)

• Note - consortium activity is not limited to these codes.

Page 4: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

The DL_POLY Molecular Dynamics Simulation Package

Bill Smith

Page 5: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 20035HPCx/Biology Discussions

DL_POLY Background

• General purpose parallel MD code• Developed at Daresbury Laboratory for CCP5 1994-today• Available free of charge (under licence) to University researchers

world-wide• DL_POLY versions:

– DL_POLY_2• Replicated Data, up to 30,000 atoms• Full force field and molecular description

– DL_POLY_3• Domain Decomposition, up to 1,000,000 atoms• Full force field but no rigid body description.

Page 6: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 20036HPCx/Biology Discussions

DL_POLY Force Field

• Intermolecular forces– All common van de Waals potentials– Sutton Chen many-body potential– 3-body angle forces (SiO2)– 4-body inversion forces (BO3)– Tersoff potential -> Brenner

• Intramolecular forces– Bonds, angle, dihedrals, inversions

• Coulombic forces– Ewald* & SPME (3D), HK Ewald* (2D), Adiabatic shell model,

Reaction field, Neutral groups*, Truncated Coulombic,

• Externally applied field– Walled cells,electric field,shear field, etc

* Not in DL_POLY_3

Page 7: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 20037HPCx/Biology Discussions

Boundary Conditions

• None (e.g. isolated macromolecules)

• Cubic periodic boundaries

• Orthorhombic periodic boundaries

• Parallelepiped periodic boundaries

• Truncated octahedral periodic boundaries*

• Rhombic dodecahedral periodic boundaries*

• Slabs (i.e. x,y periodic, z nonperiodic)

Page 8: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 20038HPCx/Biology Discussions

Algorithms and Ensembles

Algorithms

• Verlet leapfrog

• RD-SHAKE

• Euler-Quaternion*

• QSHAKE*

• [All combinations]

* Not in DL_POLY_3

Ensembles

• NVE

• Berendsen NVT

• Hoover NVT

• Evans NVT

• Berendsen NPT

• Hoover NPT

• Berendsen NT

• Hoover NT

Page 9: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 20039HPCx/Biology Discussions

AA BB

CC DD

Migration from Replicated to Distributed data DL_POLY-3 : Domain Decomposition

• Distribute atoms, forces across the nodes– More memory efficient, can address

much larger cases (105-107)

• Shake and short-ranges forces require only neighbour communication– communications scale linearly with

number of nodes

• Coulombic energy remains global– strategy depends on problem and

machine characteristics

– Adopt Smooth Particle Mesh Ewald scheme

• includes Fourier transform smoothed charge density (reciprocal space grid typically 64x64x64 - 128x128x128)

An alternative FFT algorithm has been designed to reduce communication costs

Page 10: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200310HPCx/Biology Discussions

• Conventional routines (e.g. fftw) assume plane or column distributions

• A global transpose of the data is required to complete the 3D FFT and additional costs are incurred re-organising the data from the natural block domain decomposition.

• An alternative FFT algorithm has been designed to reduce communication costs.

– the 3D FFT are performed as a series of 1D FFTs, each involving communications only between blocks in a given column

– More data is transferred, but in far fewer messages

– Rather than all-to-all, the communications are column-wise only

Plane Block

Migration from Replicated to Distributed data DL_POLY-3: Coulomb Energy Evaluation

Page 11: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200311HPCx/Biology Discussions

DL_POLY_2 & 3 Differences

• Rigid bodies not in _3

• MSD not in _3

• Tethered atoms not in _3

• Standard Ewald not in _3

• HK_Ewald not in _3

• DL_POLY_2 I/O files work in _3 but NOT vice versa

• No multiple timestep in _3

Page 12: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200312HPCx/Biology Discussions

DL_POLY_2 Developments

• DL_MULTI - Distributed multipoles

• DL_PIMD - Path integral (ionics)

• DL_HYPE - Rare event simulation

• DL_POLY - Symplectic versions 2/3

• DL_POLY - Multiple timestep

• DL_POLY - F90 re-vamp

Page 13: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200313HPCx/Biology Discussions

DL_POLY_3 on HPCx

• Test case 1 (552960 atoms, 300Dt)– NaKSi2O5 - disilicate glass– SPME (1283grid)+3 body terms, 15625 LC)– 32-512 processors (4-64 nodes)

Page 14: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200314HPCx/Biology Discussions

DL_POLY_3 on HPCx

• Test case 2 (792960 atoms, 10Dt)– 64xGramicidin(354)+256768 H2O– SHAKE+SPME(2563 grid),14812 LC– 16-256 processors (2-32 nodes)

Page 15: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200315HPCx/Biology Discussions

DL_POLY People

• Bill Smith DL_POLY_2 & _3 & GUI– [email protected]

• Ilian Todorov DL_POLY_3– [email protected]

• Maurice Leslie DL_MULTI– [email protected]

• Further Information:– W. Smith and T.R. Forester, J. Molec. Graphics, (1996), 14, 136– http://www.cse.clrc.ac.uk/msi/software/DL_POLY/index.shtml– W. Smith, C.W. Yong, P.M. Rodger,Molecular Simulation (2002), 28,

385

Page 16: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

AMBER, NAMD and Gaussian

Lorna Smith and Joachim Hein

Page 17: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200319HPCx/Biology Discussions

AMBER

• AMBER (Assisted Model Building with Energy Refinement)– A molecular dynamics program, particularly for biomolecules– Weiner and Kollman, University of California, 1981.

• Current version – AMBER7• Widely used suite of programs

– Sander, Gibbs, Roar

• Main program for molecular dynamics: Sander– Basic energy minimiser and molecular dynamics– Shared memory version – only for SGI and Cray– MPI version: master / slave, replicated data model

Page 18: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200320HPCx/Biology Discussions

AMBER - Initial Scaling

0

2

4

6

8

10

12

0 16 32 48 64 80 96 112 128 144No of Processors

Sp

ee

d-u

p

• Factor IX protein with Ca++ ions – 90906 atoms

Page 19: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200321HPCx/Biology Discussions

Current developments - AMBER

• Bob Duke– Developed a new version of Sander on HPCx– Originally called AMD (Amber Molecular Dynamics)– Renamed PMEMD (Particle Mesh Ewald Molecular Dynamics)

• Substantial rewrite of the code– Converted to Fortran90, removed multiple copies of routines,…– Likely to be incorporated into AMBER8

• We are looking at optimising the collective communications – the reduction / scatter

Page 20: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200322HPCx/Biology Discussions

Optimisation – PMEMD

0

50

100

150

200

250

300

0 50 100 150 200 250 300

No of Processors

Tim

e (s

eco

nd

s)

PMEMD

Sander7

Page 21: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200323HPCx/Biology Discussions

NAMD

• NAMD– molecular dynamics code designed for high-performance

simulation of large biomolecular systems. – Theoretical and Computational Biophysics Group, University of

Illinois at Urbana-Champaign.

• Versions 2.4, 2.5b and 2.5 available on HPCx

• One of the first codes to be awarded a capability incentive rating – bronze

Page 22: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200324HPCx/Biology Discussions

NAMD Performance

•Benchmarks from Prof Peter Coveney•TCR-peptide-MHC system

Page 23: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200325HPCx/Biology Discussions

NAMD Performance

Page 24: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200326HPCx/Biology Discussions

Molecular Simulation - NAMD Scaling

0

128

256

384

512

0 128 256 384 512

LinearIBM SP/Regatta-HCompaq AlphaServer ES45/1000

• standard NAMD ApoA-I benchmark, a system comprising 92,442 atoms, with 12Å cutoff and PME every 4 time steps.

• scalability improves with larger simulations - speedup of 778 on 1024 CPUs of TCS-1 in a 327K particle simulation of F1-ATPase. Number of CPUs

Speedup

• Parallel, object-oriented MD code• High-performance simulation of

large biomolecular systems• Scales to 100’s of processors on

high-end parallel platforms

http://www.ks.uiuc.edu/Research/namd/

Page 25: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200327HPCx/Biology Discussions

Performance Comparison

• Performance comparison between AMBER, CHARMM and NAMD

• See: http://www.scripps.edu/brooks/Benchmarks/

• Benchmark– dihydrofolate reductase protein in an explicit water bath with cubic

periodic boundary conditions. – 23,558 atoms

Page 26: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200328HPCx/Biology Discussions

Performance

Page 27: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200329HPCx/Biology Discussions

Gaussian

• Gaussian 03– Performs semi-empirical and ab initio molecular orbital

calulcations.– Gaussian Inc, www.gaussian.com

• Shared memory version available on HPCx– Limited to the size of a logical partition (8 processors)– Phase 2 upgrade will allow access to 32 processors

• Task farming option

Page 28: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

CRYSTAL and CASTEP

Ian Bush and Martin Plummer

Page 29: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200331HPCx/Biology Discussions

Crystal

• Electronic structure and related properties of periodic systems

• All electron, local Gaussian basis set, DFT and Hartree-Fock

• Under continuous development since 1974

• Distributed to over 500 sites world wide

• Developed jointly by Daresbury and the University of Turin

Page 30: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200332HPCx/Biology Discussions

Properties Energy Structure Vibrations (phonons) Elastic tensor Ferroelectric polarisation Piezoelectric constants X-ray structure factors Density of States / Bands Charge/Spin Densities Magnetic Coupling Electrostatics (V, E, EFG classical) Fermi contact (NMR) EMD (Compton, e-2e)

Crystal Functionality

• Basis Set– LCAO - Gaussians

• All electron or pseudopotential• Hamiltonian

– Hartree-Fock (UHF, RHF)

– DFT (LSDA, GGA)

– Hybrid funcs (B3LYP)

• Techniques– Replicated data parallel

– Distributed data parallel

• Forces – Structural optimization

• Direct SCF• Visualisation

– AVS GUI (DLV)

Page 31: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200333HPCx/Biology Discussions

Benchmark Runs on Crambin

• Very small protein from Crambe Abyssinica - 1284 atoms per unit cell

• Initial studies using STO3G (3948 basis functions)

• Improved to 6-31G * * (12354 functions)

• All calculations Hartree-Fock

• As far as we know the largest HF calculation ever converged

Page 32: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200334HPCx/Biology Discussions

Crambin - Parallel Performance

• Fit measured data to Amdahl’s law to obtain estimate of speed up

• Increasing the basis set size increases the scalability

• About 700 speed up on 1024 processors for 6-31G * *

• Takes about 3 hours instead of about 3 months

• 99.95% parallel

0

128

256

384

512

640

768

896

1024

0 256 512 768 1024

Number of Processors

Spee

d-up

Linear6-31G* (12,354 GTOs)6-31G (7,194 GTOs)STO-3G (3,948 GTOs)

Page 33: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200335HPCx/Biology Discussions

Results – Electrostatic Potential

• Charge density isosurface coloured according to potential

• Useful to determine possible chemically active groups

Page 34: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200336HPCx/Biology Discussions

Futures - Rusticyanin

• Rusticyanin (Thiobacillus Ferrooxidans) has 6284 atoms and is involved in redox processes

• We have just started calculations using over 33000 basis functions

• In collaboration with S.Hasnain (DL) we want to calculate redox potentials for rusticyanin and associated mutants

Page 35: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200337HPCx/Biology Discussions

What is Castep?

• First principles (DFT) materials simulation code– electronic energy – geometry optimization– surface interactions– vibrational spectra

• materials under pressure, chemical reactions

– molecular dynamics

• Method (direct minimization)– plane wave expansion of valence electrons– pseudopotentials for core electrons

Page 36: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200338HPCx/Biology Discussions

HPCx: biological applications

• Examples currently include:– NMR of proteins– hydroxyapatite (major component of bone)– chemical processes following stroke

• Possibility of treating systems with a few hundred atoms on HPCx

• May be used in conjunction with classical codes (eg DL_POLY) for detailed QM treatment of ‘features of interest’

Page 37: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200339HPCx/Biology Discussions

Castep 2003 HPCx performance gain

0

1000

2000

3000

4000

5000

6000

7000

8000

Job

tim

e

80 160 240 320

Total number of processors

Al2O3 120 atom cell, 5 k- points

Jan-03

Current 'Best'

Page 38: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200340HPCx/Biology Discussions

Castep 2003 HPCx performance gain

0

2000

4000

6000

8000

10000

12000

14000

16000

Job

Tim

e

128 256 512

Total number of processors

Al2O3 270 atom cell, 2 k- points

Jan-03

Current 'Best'

Page 39: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200341HPCx/Biology Discussions

HPCx: biological applications

• Castep (version 2) is written by:– M Segall, P Lindan, M Probert C Pickard, P Hasnip, S Clark, K

Refson, V Milman, B Montanari, M Payne.– ‘Easy’ to understand top-level code.

• Castep is fully maintained and supported on HPCx

• Castep is distributed by Accelrys Ltd

• Castep is licensed free to UK academics by the UKCP consortium (contact [email protected])

Page 40: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

CHARMM, NWChem and GAMESS-UK

Paul Sherwood

Page 41: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200343HPCx/Biology Discussions

Single, shared data structure

Physically distributed data

NWChem

• Objectives– Highly efficient and portable

MPP computational chemistry package

– Distributed Data - Scalable with respect to chemical system size as well as MPP hardware size

– Extensible Architecture• Object-oriented design

– abstraction, data hiding, handles, APIs

• Parallel programming model– non-uniform memory access,

global arrays

• Infrastructure– GA, Parallel I/O, RTDB, MA, …

– Wide range of parallel functionality essential for HPCx

• Tools– Global arrays:

• portable distributed data tool:

• Used by CCP1 groups (e.g. MOLPRO)

– PeIGS:• parallel eigensolver, • guaranteed orthogonality

of eigenvectors

Page 42: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200344HPCx/Biology Discussions

Distributed Data SCF

Pictorial representation of the iterative SCF Pictorial representation of the iterative SCF process in (i) a sequential process, and (ii) a process in (i) a sequential process, and (ii) a distributed data parallel process: distributed data parallel process: MOAO MOAO represents the molecular orbitals,represents the molecular orbitals, P P the the density matrix and density matrix and FF the Fock or the Fock or Hamiltonian matrixHamiltonian matrix

SequentialSequential

Distributed DataDistributed Data

MOAOP

dgemm

Integrals

VXC

VCoul

V1e

SequentialEigensolver

F

guessorbitals

If Converged

MOAO P

ga_dgemm

IntegralsVXC

VCoulV 1e

PeIGSF

guessorbitals

If Converged

Page 43: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200345HPCx/Biology Discussions

NWChem

NWChem Capabilities (Direct, Semi-direct and conventional):– RHF, UHF, ROHF using up to 10,000 basis functions; analytic

1st and 2nd derivatives.– DFT with a wide variety of local and non-local XC potentials,

using up to 10,000 basis functions; analytic 1st and 2nd derivatives.

– CASSCF; analytic 1st and numerical 2nd derivatives.– Semi-direct and RI-based MP2 calculations for RHF and

UHF wave functions using up to 3,000 basis functions; analytic 1st derivatives and numerical 2nd derivatives.

– Coupled cluster, CCSD and CCSD(T) using up to 3,000 basis functions; numerical 1st and 2nd derivatives of the CC energy.

– Classical molecular dynamics and free energy simulations with the forces obtainable from a variety of sources

Page 44: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200346HPCx/Biology Discussions

SiSi88OO77HH1818 347/832347/832

SiSi88OO2525HH1818 617/1444617/1444

SiSi2626OO3737HH3636 1199/28181199/2818

SiSi2828OO6767HH3030 1687/39281687/3928

• DFT Calculations with DFT Calculations with Coulomb FittingCoulomb Fitting

Basis (Godbout et al.)Basis (Godbout et al.) DZVP - O, SiDZVP - O, Si

DZVP2 - HDZVP2 - HFitting Basis:Fitting Basis:

DGAUSS-A1 - O, SiDGAUSS-A1 - O, SiDGAUSS-A2 - HDGAUSS-A2 - H

• NWChem & GAMESS-UKNWChem & GAMESS-UK

Both codes use auxiliary fitting Both codes use auxiliary fitting basis for coulomb energy, with 3 basis for coulomb energy, with 3 centre 2 electron integrals held in centre 2 electron integrals held in corecore..

Case Studies - Zeolite Fragments

Page 45: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200347HPCx/Biology Discussions

2388

1147

2414

951 907

1271

517 502 490404 390

303

0

500

1000

1500

2000

2500

32 64 128

CS7 AMD K7/1000 + SCICS9 P4/2000 + Myrinet 2kCS2 QSNet Alpha Cluster/667SGI Origin 3800/R14k-500IBM SP/p690AlphaServer SC ES45/1000

Number of CPUs Number of CPUs

Measured Time (seconds)

SiSi2828OO6767HH3030 1687/39281687/3928SiSi2626OO3737HH3636 1199/28181199/2818

4682

2424

2351

5507

2053

1580

3008

1617

1504

3050

1418

1182 880 834611

0

2000

4000

6000

32 64 128

CS7 AMD K7/1000 + SCICS9 P4/2000 + Myrinet 2kCS2 QSNet Alpha ClusterSGI Origin 3800/R14k-500IBM SP/p690AlphaServer SC ES45/1000

DFT Coulomb Fit - NWChem

Measured Time (seconds)

Page 46: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200348HPCx/Biology Discussions

• DZVP Basis (DZV_A2) and DgaussDZVP Basis (DZV_A2) and Dgauss A1_DFT Fitting basis: A1_DFT Fitting basis:

AO basis: AO basis: 3554 3554 CD basis:CD basis: 1271312713• IBM SP/p690)IBM SP/p690)

Wall time (13 SCF iterations):Wall time (13 SCF iterations):64 CPUs = 9,184 seconds64 CPUs = 9,184 seconds128 CPUs= 3,966 seconds128 CPUs= 3,966 seconds

MIPS R14k-500 CPUs (Teras)MIPS R14k-500 CPUs (Teras)Wall time (13 SCF iterations):Wall time (13 SCF iterations):

64 CPUs = 5,242 seconds64 CPUs = 5,242 seconds128 CPUs= 3,451 seconds128 CPUs= 3,451 seconds

Zeolite ZSM-5Zeolite ZSM-5Zeolite ZSM-5Zeolite ZSM-5

• 3-centre 2e-integrals = 1.00 X 103-centre 2e-integrals = 1.00 X 10 12 12

• Schwarz screening = 6.54 X 10Schwarz screening = 6.54 X 10 9 9

• % 3c 2e-ints. In core = 100%% 3c 2e-ints. In core = 100%

Memory-driven Approaches: NWChem - DFT (LDA): Performance on the IBM SP/p690

Page 47: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200349HPCx/Biology Discussions

† M.F. Guest, J.H. Amos, R.J. Buenker, H.J.J. van Dam, M. Dupuis, N.C. Handy, I.H. Hillier, P.J. Knowles, V. Bonacic-Koutecky van Lenthe, J. Kendrick, K. Schoffel & P. Sherwood, with contributions from R.D., W. von Niessen, R.J. Harrison, A.P. Rendell, V.R. Saunders, A.J. Stone and D. Tozer.

GAMESS-UK

• GAMESS-UK is the general purpose ab initio molecular electronic structure program for performing SCF-, MCSCF- and DFT-gradient calculations, together with a variety of techniques for post Hartree Fock calculations.

– The program is derived from the original GAMESS code, obtained from Michel Dupuis in 1981 (then at the National Resource for Computational Chemistry, NRCC), and has been extensively modified and enhanced over the past decade.

– This work has included contributions from numerous authors†, and has been conducted largely at the CCLRC Daresbury Laboratory, under the auspices of the UK's Collaborative Computational Project No. 1 (CCP1). Other major sources that have assisted in the on-going development and support of the program include various academic funding agencies in the Netherlands, and ICI plc.

• Additional information on the code may be found from links at:http://www.dl.ac.uk/CFS

Page 48: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200350HPCx/Biology Discussions

GAMESS-UK features 1.

– Hartree Fock: • Segmented/ GC + spherical harmonic basis sets • SCF-Energies and Gradients: conventional, in-core, direct• SCF-Frequencies: numerical and analytic 2nd derivatives • Restricted, unrestricted open shell SCF and GVB.

– Density Functional Theory • Energies + gradients, conventional and direct including Dunlap fit• B3LYP, BLYP, BP86, B97, HCTH, B97-1, FT97 & LDA functionals • Numerical 2nd derivatives (analytic implementation in testing)

– Electron Correlation: • MP2 energies, gradients and frequencies, Multi-reference MP2, MP3 Energies • MCSCF and CASSCF Energies, gradients and numerical 2nd derivatives • MR-DCI Energies, properties and transition moments (semi-direct module)• CCSD and CCSD(T) Energies • RPA (direct) and MCLR excitation energies / oscillator strengths, RPA gradients• Full-CI Energies • Green's functions calculations of IPs. • Valence bond (Turtle)

Page 49: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200351HPCx/Biology Discussions

GAMESS-UK features 2.

– Molecular Properties: • Mulliken and Lowdin population analysis, Electrostatic Potential-Derived Charges • Distributed Multipole Analysis, Morokuma Analysis, Multipole Moments • Natural Bond Orbital (NBO) + Bader Analysis • IR and Raman Intensities, Polarizabilities & Hyperpolarizabilities • Solvation and Embedding Effects (DRF)• Relativistic Effects (ZORA)

– Pseudopotentials: • Local and non-local ECPs.

– Visualisation: tools include CCP1 GUI– Hybrid QM/MM (ChemShell + CHARMM QM/MM) – Semi-empirical : MNDO, AM1, and PM3 hamiltonians – Parallel Capabilities:

• MPP and SMP implementations (GA tools) • SCF/DFT energies, gradients, frequencies• MP2 energies and gradients• Direct RPA

Page 50: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200352HPCx/Biology Discussions

Parallel Implementation of GAMESS-UK

• Extensive use of Global Array (GA) Tools and Parallel Linear Algebra from NWChem Project (EMSL)

• SCF and DFT– Replicated data, but …– GA Tools for caching of I/O for restart and checkpoint files– Storage of 2-centre 2-e integrals in DFT Jfit – Linear Algebra (via PeIGs, DIIS/MMOs, Inversion of 2c-2e matrix)

• SCF and DFT second derivatives– Distribution of <vvoo> and <vovo> integrals via GAs

• MP2 gradients– Distribution of <vvoo> and <vovo> integrals via Gas

• Direct RPA Excited States– Replicated data with parallelisation of direct integral evaluation

Page 51: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200353HPCx/Biology Discussions

104

92

102

32

64

96

128

32 64 96 128

LinearSGI Origin 3800/R14k-500IBM SP/Regatta-HAlphaServer ES45/1000

4731

26142504

2838

16811584

1867

12811100

0

1250

2500

3750

5000

32 64 128

SGI Origin 3800/R14k-500IBM SP/Regatta-H

AlphaServer ES45/1000

81

32

64

96

128

32 64 96 128

LinearSGI Origin 3800/R14k-500IBM SP/Regatta-HAlphaServer ES45/1000

GAMESS-UK: DFT Calculations

Number of CPUs

Number of CPUs

Elapsed Time (seconds)

Valinomycin (DFT HCTH):Valinomycin (DFT HCTH):Basis: DZVP2_A2 (Dgauss)Basis: DZVP2_A2 (Dgauss)(1620 GTOs)(1620 GTOs)

Speedup

Cyclosporin (DFT B3LYP):Cyclosporin (DFT B3LYP):Basis: 6-31G* Basis: 6-31G* (1855 GTOs)(1855 GTOs)

11053

55575823 5846

31093081 3388

19401825

0

3000

6000

9000

12000

32 64 128

SGI Origin 3800/R14k-500

IBM SP/Regatta-H

AlphaServer ES45/1000

Page 52: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200354HPCx/Biology Discussions

DFT Analytic 2nd Derivatives PerformanceIBM SP/p690, HP/Compaq SC ES45/1000 and SGI O3800

(C6H4(CF3))2: Basis 6-31G (196 GTO)

Elapsed Time (seconds)

Terms from MO 2e-integrals in GA storage (CPHF & pert. Fock matrices); Calculation dominated by CPHF:

CPUs

1614

989

743

1937

1073

569470

1175

354 307

0

500

1000

1500

2000

2500

3000

32 64 128

CS14 PIII/1000 + Myrinet (1 CPU)SGI Origin3800/R14k-500 - B3LYPIBM SP/p690 - B3LYPIBM SP/p690 - HCTHAlphaServer ES45/1000 - B3LYPAlphaServer ES45/1000 - HCTH

Page 53: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200355HPCx/Biology Discussions

CHARMM

• CHARMM (Chemistry at HARvard Macromolecular Mechanics) is a general purpose molecular mechanics, molecular dynamics and vibrational analysis package for modelling and simulation of the structure and behaviour of macromolecular systems (proteins, nucleic acids, lipids etc.)

• Supports energy minimisation and MD approaches using a classical parameterised force field.

• J. Comp. Chem. 4 (1983) 187-217

• Parallel Benchmark - MD Calculation of Carboxy Myoglobin (MbCO) with 3830 Water Molecules.

• QM/MM model for study of reacting species– incorporate the QM energy as part of the system into the force

field– coupling between GAMESS-UK (QM) and CHARMM.

Page 54: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200356HPCx/Biology Discussions

8

16

24

32

40

48

56

64

8 16 24 32 40 48 56 64

LinearCS1 PIII/450 + FE/LAMCS2 QSNet Alpha Cluster/667CS10 P4/2666 + MyrinetCray T3E/1200ESGI Origin 3800/R14k-500

62

104

64

5159

83

3744

69

6466

114

73

61

89

4654

72

0

50

100

150

16 32 64

CS2 QSNet Alpha Cluster/667CS9 P4/2000 + Myrinet 2kCS12 P4/2400 + Gbit EtherCS10 P4/2666 + MyrinetSGI Origin 3800/R14k-500AlphaServer SC ES45/1000IBM SP/p690

Parallel CHARMM Benchmark

Benchmark MD Calculation of Carboxy Myoglobin Benchmark MD Calculation of Carboxy Myoglobin (MbCO) with 3830 Water Molecules: (MbCO) with 3830 Water Molecules: 14026 atoms, 1000 14026 atoms, 1000 steps (1 ps), 12-14 A shift.steps (1 ps), 12-14 A shift.

Page 55: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200358HPCx/Biology Discussions

• QM region 35 atoms (DFT BLYP) – include residues with possible proton donor/acceptor roles – GAMESS-UK, MNDO, TURBOMOLE

• MM region (4,180 atoms + 2 link)– CHARMM force-field, implemented in CHARMM, DL_POLY

QM/MM Applications

Triosephosphate isomerase (TIM)

• Central reaction in glycolysis, catalytic interconversion ofDHAP to GAP• Demonstration case within QUASI (Partners UZH, and BASF)

Triosephosphate isomerase (TIM)

• Central reaction in glycolysis, catalytic interconversion ofDHAP to GAP• Demonstration case within QUASI (Partners UZH, and BASF)

Measured Time (seconds)

T T 128128 (IBM SP/Regatta-H) = 143 secs (IBM SP/Regatta-H) = 143 secs

1030

1487

714 797

540

778

419431

308

428

274246

196

257213

170

0

400

800

1200

1600

8 16 32 64

CS9 P4/2000 + Myrinet 2k

SGI Origin3800/R14k-500

AlphaServer SC ES45/1000

IBM SP/Regatta-H

Number of CPUs

Page 56: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200359HPCx/Biology Discussions

– Multiple independent simulations

– Replica exchange - Monte Carlo exchange of configurations between an ensemble of replicas at different temperatures

– Combinatorial approach to ligand binding

– Replica path method - simultaneously optimise a series of points defining a reaction path or conformational change, subject to path constraints.• Suitable for QM and QM/MM Hamiltonians• Parallelisation per point • Communication is limited to

adjacent points on the path - global sum of energy function

PP3636

PP44

PP3232 PP3333

PP11PP00

PP3434 PP3535

PP33PP22

EE

Reaction Co-ordinateReaction Co-ordinate

Collaboration with Bernie Brooks (NIH) Collaboration with Bernie Brooks (NIH) http://www.cse.clrc.ac.uk/qcg/chmgukhttp://www.cse.clrc.ac.uk/qcg/chmguk

Sampling Methods

Page 57: Opportunities for Biological Consortia on HPCx Code Capabilities and Performance HPCx and CCP Staff

Royal Institution, 6th November 200360HPCx/Biology Discussions

Summary

• Many of the codes used by the community have quite poor scaling

• Best cases– large quantum calculations (Crystal, DFT etc)– very large MD simulations (NAMD)

• For credible consortium bid we need to focus on applications which have– acceptable scaling now (perhaps involving migration to new codes

(e.g. NAMD)– heavy CPU or memory demands (e.g. CRYSTAL).– potential for algorithmic development to exploit 1000s of

processors (e.g. pathway optimisation, Monte Carlo etc)