High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @...

26
High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi @ NMRI Kazuhide Nakata @ Tokyo Tech Maho Nakata @ RIKEN KSIAM Annual Meeting @ Jeju 2011/11/25 (2011/11/25-2011/11/26) This talk is supported by Ewha University

Transcript of High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @...

Page 1: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

High Performance Solvers for Semidefinite Programs

Makoto Yamashita @ Tokyo TechKatsuki Fujisawa @ Chuo UnivMituhiro Fukuda @ Tokyo TechKazuhiro Kobayashi @ NMRIKazuhide Nakata @ Tokyo TechMaho Nakata @ RIKEN

KSIAM Annual Meeting @ Jeju 2011/11/25(2011/11/25-2011/11/26)

This talk is supported by Ewha University

Page 2: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 2

Our interests & SDPA Family

How fast can we solve SDPs? How large SDP can we solve? How accurate can we solve SDPs?

SDPA Homepage http://sdpa.sf.net/

Parallel

SDPA

SDPARA

SDPA-M

SDPARA-C

SDPA-C

SDPA-GMP

Matlab

Base solver

Multiple precision

Strucutural Sparsity

Page 3: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 3

SDPA Online Solver

1. Log-in the online solver

2. Upload your problem

3. Push ’Execute’ button

4. Receive the result via Web/Mail

http://sdpa.sf.net/ ⇒ Online Solver

Page 4: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

Outline

1. SDP Applications2. Primal-Dual Interior-Point Methods3. Inside of SDPARA (Large & Fast)4. Inside of SDPA-GMP (Accurate)5. Conclusion

Page 5: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 5

SDP Applications

Control Theory Quantum Chemistry Sensor Network Localization Problem Polynomial Optimization

Page 6: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

INFOMRS 2011 @ Charlotte 6

SDP Applications 1.Control theory

Against swing,we want to keep stability.

Stability Condition⇒ Lyapnov Condition⇒ SDP

Page 7: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

INFOMRS 2011 @ Charlotte 7

Ground state energy Locate electrons

Schrodinger Equation⇒Reduced Density Matrix⇒SDP

SDP Applications2. Quantum Chemistry

Page 8: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

INFOMRS 2011 @ Charlotte 8

SDP Applications3. Sensor Network Localization

Distance Information⇒Sensor Locations

Protein Structure

Page 9: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 9

SDP Applications 4. Polynomial Optimization

For example,

NP-hard in general Very good lower bound

by SDP relaxation method

sconstraintPolynomialPolynomial ..:min ts

nn

iiii Rxxxxxf

,)(100)1()(:min

1

1

221

2

Page 10: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 10

SDP Applications

Control Theory Quantum Chemistry Polynomial Optimization Sensor Network Localization Problem

How Large & How Fast & How Accurate

Many Applications

Page 11: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 11

Standard form

The variables are Inner Product is The size is roughly determined by

m

kkk

m

kkk

kk

OYCYzA

zbD

OXmkbXA

XCP

1

1

,s.t.

max)(

),,,1(s.t.

min)(

mnn RSSzYX ,,,,

n

jiijijYXYX

1,

YXn

Pm

and of size the

)(in sconstraintequality ofnumber the Our target 000,30m

Ordinal solver

000,10m

Page 12: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 12

Primal-Dual Interior-Point Methods

Feasible region

mnn RSSzYX ,,,, *** ,, zYX

Optimal

Central Path

000 ,, zYX

),,( dzdYdXTarget

111 ,, zYX

222 ,, zYX

Page 13: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 13

Schur Complement Matrix

2/,1

1T

m

jjj

dXdXdXYXdYRdX

dzADdY

rBdz

jiij AYXAB 1where

Schur Complement Equation

Schur Complement Matrix

1. ELEMENTS (Evaluation of SCM)2. CHOLESKY (Cholesky factorization of SCM)

Page 14: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 14

Computation time on single processor

SDPARA replaces these bottleneks by parallel computation

Control POP

ELEMENTS 22228 668

CHOLESKY 1593 1992

Total 23986 2713

Time unit is second, SDPA 7, Xeon 5460 (3.16GHz)

%95Row-wise distribution

Two-dimensional block-cyclic distribution

Page 15: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 15

Row-wise distribution

All rows are independent

Assign processorsin a cyclic manner

Simple idea⇒Very EFFICIENT

High scalability

Processor1

Processor2

Processor3

Processor2

Processor3

Processor4

Processor1

Processor4

B

jiij AYXAB 188SBExample

Page 16: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

Block Algorithm for Cholesky factorization

Triangular Factorization

UUB T

222212121211

12111111

22

1211

22

1211

2212

1211

UUUUUU

UUUU

UO

UU

UO

UU

BB

BBTTTT

TTT

T

12122222

12

1

1112

111111

.3

.2

.1

UUBB

BUU

UUB

T

T

T

Small Cholesky factorizaton

Block Updates

Parallel Computing

(U: upper triangular matrix)

)4.e.g(, 2211 pSBSB pmp

Page 17: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 17

Two-dimensional block-cyclic distribution

Scalapack library

From the row-wise to TDBCD requires network communication

Cholesky on TDBCD is much faster than the on row-wise

1 1 2 2 1 1 2 2

1 1 2 2 1 1 2 2

3 3 4 4 3 3 4 4

3 3 4 4 3 3 4 4

1 1 2 2 1 1 2 2

1 1 2 2 1 1 2 2

3 3 4 4 3 3 4 4

3 3 4 4 3 3 4 4

B

88SBExample

Processor1

Processor2

Processor3

Processor2

Processor3

Processor4

Processor1

Processor4

B

Page 18: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 18

Numerical Results of SDPARA Quantum Chemistry (m=7230, SCM=100%), middle size SDPARA 7.3.1, Xeon X5460, 3.16GHz x2, 48GB memory

28678

7192

1826548

13147

29700

7764

2294

10

100

1000

10000

100000

1 4 16

Servers

Sec

ond

ELEMENTSCHOLESKYTotal

ELEMENTS 15x speedupCHOLESKY 12x speedupTotal 13x speedup

Very FAST!!

Page 19: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 19

Acceleration by Multiple Threading

Modern Processors have multi-cores

Multiple Threading is becoming common

Processor1:Thread1

Processor2:Thread1

Processor1:Thread2

Processor2:Thread1

Processor1:Thread2

Processor2:Thread2

Processor1:Thread1

Processor2:Thread2

B 2 Processorsx2 Threads on each processor

Two-level Parallel Computing

Page 20: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 20

Comparison with PCSDP

developed by Ivanov & de Klerk

Servers 1 2 4 8 16

PCSDP 53,768 27,854 14,273 7995 4050

SDPARA 5983 3002 1680 901 565

SDP: B.2P Quantum Chemistry (m = 7230, SCM = 100%)Xeon X5460, 3.16GHz x2 (8core), 48GB memory

Time unit is second

SDPARA is 8x faster by MPI & Multi-Threading(Two-level parallization)

Page 21: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 21

Extremely Large-Scale SDPs

16 Servers [Xeon X5670(2.93GHz) , 128GB Memory]

m SCM time

Esc32_b(QAP) 198,432 100% 129,186 second (1.5days)

Other solvers can handle only 000,30m

The LARGEST solved SDP in the world

Page 22: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 22

Numerical Accuracy

One weakpoint of PDIPM . PDIPM requires

Eventually, numerical trouble (often, Cholesky fails)

),(),(lim, **** YXYXOYX kk

k

11 )(&)( kk YX

optimal ),,( *** zYX

jiij AYXAB 1for example,

Page 23: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

Ordinal double precision in C or C++

arbitrary precision in GMP library

Numerical Precision

1610

Replace BLAS(Basic Linear Algebra Sytems) by MPLAPACK (Multiple precision LAPACK) SDPA-GMP

64bit = 1bit(sign) + 11bit(exponent)+53bit(fraction);

a b c cba 121

accuracy =

a b cWe can arbitrary set the bit number of fraction part.

(for example, 200bit = )5310

Page 24: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 24

Numerically Hard problem Test Problem

PDIPM is stable if Slater’s condition

Graph Partition Problemhas no interior

Small ⇒ Numerically Hard

OXniXeeXeetsXC Tii

T ),,,1(1,..:min

0

OXXeeXeeX Tii

T ,1,0:

),,1(.. mkbXAtsOX kk

Page 25: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 25

Numerical Results of SDPA-GMP Small ⇒ Numerically Hard

Solver Accuracy Time(second)

1.0e-1 SDPA 1.08e-8 2.03

SDPA-GMP 4.80e-48 77760.19

1.0e-15 SDPA 1.63e-7 2.26

SDPA-GMP 2.97e-48 82115.52

0 SDPA 5.26e-9 2.36

SDPA-GMP 7.29e-24 105325.74

SDPA-GMP uses 300 digits

24digits for even no-interior case

Page 26: High Performance Solvers for Semidefinite Programs Makoto Yamashita @ Tokyo Tech Katsuki Fujisawa @ Chuo Univ Mituhiro Fukuda @ Tokyo Tech Kazuhiro Kobayashi.

KSIAM 2011 @ Jeju 26

Conclusion

SDPARA ⇒ How Fast & How Large 100times &

SDPA-GMP ⇒ How Accurate

http://sdpa.sf.net/ & Online solver

Thank you very much for your attention.

000,200m4810