Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix

Makoto Yamashita @ Tokyo-TechKatsuki Fujisawa @ Chuo UniversityMituhiro Fukuda @ Tokyo-TechYoshiaki Futakata @ University of VirginiaKazuhiro Kobayashi @ National Maritime Research InstituteMasakazu Kojima @ Tokyo-TechKazuhide Nakata @ Tokyo-TechMaho Nakata @ RIKEN

ISMP 2009 @ Chicago [2009/08/26]

Extremely Large SDPs Arising from various fields

Quantum Chemistry Sensor Network Problems Polynomial Optimization Problems

Most computation time is related to Schur complement matrix (SCM)

[SDPARA]Parallel computation for SCM In particular, sparse SCM

Outline

1. SemiDefinite Programming and Schur complement matrix

2. Parallel Implementation3. Parallel for Sparse Schur complement4. Numerical Results5. Future works

Standard form of SDP

Primal-Dual Interior-Point Methods

Computation for Search Direction

Schur complement matrix ⇒ Cholesky Factorizaiton

Exploitation of Sparsity in 1.ELEMENTS

2.CHOLESKY

Bottlenecks on Single Processor

Apply Parallel Computation to the Bottlenecks

in secondOpteron 246 (2.0GHz)

LiOH HF

m 10592 15018

ELEMENTS 6150( 43%) 16719( 35%)

CHOLESKY 7744( 54%) 20995( 44%)

TOTAL 14250(100%) 47483(100%)

SDPARA SDPA parallel version

(generic SDP solver) MPI & ScaLAPACK

Row-wise distribution for ELEMENTS parallel Cholesky factorization for CHOLESKY

http://sdpa.indsys.chuo-u.ac.jp/sdpa/

Row-wise distribution for evaluation of the Schur complement matrix

4 CPU is availableEach CPU computes only their assigned rows

. No communication between CPUsEfficient memory management

Parallel Cholesky factorization We adopt Scalapack for the Cholesky factorization of t

he Schur complement matrix We redistribute the matrix from row-wise to two-dimen

sional block-cyclic distribtuion

Redistribution

Computation time on SDP from Quantum Chemistry [LiOH]

3514969

61501654

1186357

100000

1 4 16 64#processors

second TOTAL

ELEMENTSCHOLESKY

AIST super clusterOpteron 246 (2.0GHz)

6GB memory/node

Sclability on SDP from Quantum Chemistry [NF]

1 2 4 8 16 32 64#processors

scalability TOTAL

ELEMENTSCHOLESKY

Total 29 times

ELEMENTS 63 times

CHOLESKY 39 times

ELEMENTS is very effective

Sparse Schur complement matrix

Schur complement matrix becomes very sparse for some applications.

⇒Simple Row-wise loses its efficiencyfrom Control Theory（１００％） from Sensor Network(2.12%)

Sparseness ofSchur complement matrix

Many applications havediagonal block structure

Exploitation of Sparsityin SDPA

We change the formula by row-wise

ELEMENTS forSparse Schur complement

150 40 30 20

135 20

Load on each CPU

CPU1:190

CPU2:185

CPU3:188

CHOLESKY forSparse Schur complement Parallel Sparse Cholesky factorization implemente

d in MUMPS MUMPS adopts Multiple Frontal method

150 40 30 20

135 20

Memory storage on each processor should

be consecutive.

The distribution for ELEMENTS matches

this method.

Computation time for SDPs from Polynomial Optimization Problem

1126645 486 479

270 251

411207

664391

243 336179 188

1 2 4 8 16 32#processors

second TOTAL

ELEMENTSCHOLESKY

tsubasaXeon E5440 (2.83GHz)

8GB memory/node

Parallel Sparse Cholesky achieves mild scalability.ELEMENTS attains 24x speed-up on 32 CPUs.

ELEMENTS Load-balance on 32 CPUs

Only first processor has a little heavier computation.

Processor Number

200000

400000

600000

800000

1000000

1200000

1400000

Time(second) #distributed elements

Automatic selection ofsparse / dense SCM Dense Parallel Cholesk

y achieves higher scalability than Sparse Parallel Cholesky

Dense becomes better for many processors.

We estimate both computation time using computation cost and scalability. 1

second auto

densesparse

Sparse/Dense CHOLESKY for a small SDP from POP

70 52 4424

71 52 44 36 30 30

second auto

densesparse

tsubasaXeon E5440 (2.83GHz)

8GB memory/node

Only on 4 CPUs, the auto selection failed.(since scalability on sparse cholesky

is unstable on 4 CPUs.)

Numerical Results

Comparison with PCSDP Sensor Network Problem

generated by SFSDP Multi Threading

Quantum Chemistry

SDPs from Sensor Network#sensors 1,000 (m=16,450: density 1.23%)

#CPU 1 2 4 8 16

SDPARA 28.2 22.1 16.7 13.8 27.3

PCSDP M.O. 1527 887 591 368

#sensors 35,000 (m=527,096: density )

#CPU 1 2 4 8 16

SDPARA 1080 845 614 540 506

PCSDP Memory Over. if #sensors >= 4,000

(time unit : second)

MPI + Multi Threading for Quantum Chemistry

N.4P.DZ.pqgt11t2p(m=7230)

5376336206

2785418134

142739190

78954729

46502479

100000

1 2 4 8 16

#nodes

PCSDPSDPARA(1)SDPARA(2)SDPARA(4)SDPARA(8)

64x speed-up on [16nodesx8threads]

Concluding Remarks & Future works

1. New parallel schemes for sparse Schur complement matrix

2. Reasonable Scalability3. Extremely large-scale SDPs with sparse Sc

hur complement matrix

Improvement on Multi-Threading for sparse Schur complement matrix

Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix

Documents

Transcript of Parallel Software for SemiDefinite Programming with Sparse Schur Complement Matrix

Fuga d'amore a case sparse

Misure, rilflessioni sparse

A Semidefinite Programming Formulation of Quantum and ...

CVPR2010: Sparse Coding and Dictionary Learning for Image Analysis: Part 3: Optimization Techniques for Sparse Coding

Schur-Weyl duality and dominant dimension · Schur-Weyl duality and dominant dimension Ming Fang Institute of Mathematics, Chinese Academy of Sciences ... Given an idempotent e in

Frammentazioni quotidiane sparse

Theodorakopoulos Sparse 11 2012

Betsch schur backup backup

Sparse estimation tutorial 2014

KLU Sparse

ANÁLISIS DE COMPONENTES PRINCIPALES SPARSE

Sparse grids - uni-bonn.de

Sparse canopy parameterizations for meteorological models ...

Les fondements de la géométrie selon Friedrich Schur

Sparse Fluid Simulation in DirectX - Nvidiadeveloper.download.nvidia.com/.../Dunn_Alex_SparseFluidSimulation.… · Sparse Fluid Simulation in DirectX Alex Dunn Dev. Tech. – NVIDIA

Sparse Abstract Interpretation

Hierarchical Polynomial-Bases & Sparse Grids 1/21 grid: Gitter сéтка sparse: spärlich, dünn рéдкий.

Autotuning sparse matrix kernels

Microsoft · dense-CNNI dense-CNN2 sparse-CNNI sparse-CNN2 0.05 0.1 0.15 0.2 0.25 0.3 Noise 1) 2) NETWORK DENSE CNN-I DENSE CNN-2 SPARSE CNN-I SPARSE CNN-2 0.35 0.4 0.5 Networks used

Semidefinite Programming Relaxation for Nonconvex Title ...Semidefinite Programming Relaxation for Nonconvex Quadratic Programs(Discrete and Continuous Structures in Optimization)