20,000 Hours in the Cloud - Top 5 Cloud Lessons Learned By Tom Lounibos, CEO - SOASTA
AIST Super Green Cloud: lessons learned from the operation and the performance evaluation of HPC...
-
Upload
ryousei-takano -
Category
Technology
-
view
265 -
download
3
Transcript of AIST Super Green Cloud: lessons learned from the operation and the performance evaluation of HPC...
Ryousei Takano, Yusuke Tanimura, Akihiko Oota, !Hiroki Oohashi, Keiichi Yusa, Yoshio Tanaka
National Institute of Advanced Industrial Science and Technology, Japan
ISGC 2015@Taipei, 20 March. 2015
AIST Super Green Cloud: !Lessons Learned from the Operation and !the Performance Evaluation of HPC Cloud
Introduction �• HPC Cloud is promising HPC platform.• Virtualization is a key technology.
– Pro: a customized software environment, elasticity, etc
– Con: a large overhead, spoiling I/O performance.• VMM-bypass I/O technologies, e.g., PCI passthrough and !
SR-IOV, can significantly mitigate the overhead.
• “99% of HPC jobs running on US NSF computing centers fit into one rack.” -- M. Norman, UCSD
• Current virtualization technologies are feasible enough for supporting such a scale.
LINPACK on ASGC �
4
0
10
20
30
40
50
60
0 32 64 96 128
Perf
orm
ance
(TFL
OP
S)�
Number of Nodes �
Physical ClusterVirtual Cluster
Performance degradation: ! 5.4 - 6.6%
Efficiency* on 128 nodesPhysical cluster: 90%Virtual cluster: 84%
*) Rmax / Rpeak
IEEE CloudCom 2014
Introduction (cont’d)�• HPC Clouds are heading for hybrid-cloud and multi-
cloud systems, where the user can execute their application anytime and anywhere he/she wants.
• Vision of AIST HPC Cloud:
“Build once, run everywhere”• AIST Super Green Cloud (ASGC): !
a fully virtualized HPC system
Outline �• AIST Super Green Cloud (ASGC) and!
HPC Cloud service
• Lessons learned from the first six months of operation• Experiments• Conclusion
6
Vision of AIST HPC Cloud !“Build Once, Run Everywhere”�
7
Academic Cloud
Private Cloud
Commercial Cloud
Virtual cluster!templates
Deploy a Virtual Cluster �
Feature 1: Create a customized virtual cluster easily�
Feature 2: Build a virtual cluster once, and run it everywhere on clouds
Usage Model of AIST Cloud �
8
Allow users to customize their virtual clusters
0 21
Web appsBigDataHPC
1. Select a template of! a virtual machine �
2. Install required software! package �
VM template files
HPC
+Ease of use �
Virtual cluster
deploy �take snapshots �
Launch a virtual machine when necessary
3. Save a user-customized !template in the repository �
9
Elastic Virtual Cluster �
Cloud controller
Login node
sgc-tools
Image repository
Virtual cluster!template
Cloud controller
Frontend node
cmp!node
cmp node
cmp node Scale in/!
scale out
NFSdJob scheduler
Virtual Cluster on ASGC�
InfiniBand/EthernetImage
repository
Import/export
Create a virtual cluster�
Virtual Cluster!on Public Cloud�
Frontend node
cmp!node
cmp!node
Ethernet
Submit a job�Submit a job�
In operationUnder development
ASGC Hardware Spec.�
10
Compute Node�CPU Intel Xeon E5-2680v2/2.8GHz !
(10 core) x 2CPU
Memory 128 GB DDR3-1866
InfiniBand Mellanox ConnectX-3 (FDR)
Ethernet Intel X520-DA2 (10 GbE)
Disk Intel SSD DC S3500 600 GB
• 155 node-cluster consists of Cray H2312 blade server• The theoretical peak performance is 69.44 TFLOPS
Network switch �InfiniBand Mellanox SX6025
Ethernet Extreme BlackDiamond X8
ASGC Software Stack �Management Stack
– CentOS 6.5 (QEMU/KVM 0.12.1.2)
– Apache CloudStack 4.3 + our extensions• PCI passthrough/SR-IOV support (KVM only)
• sgc-tools: Virtual cluster construction utility
– RADOS cluster storageHPC Stack (Virtual Cluster)
– Intel Cluster Studio SP1 1.2.144– Mellanox OFED 2.1
– TORQUE job scheduler 4.2.8
11
Storage Architecture �
User%a'ached+storage
x+NVLAN
Compute+nodes
Compute+network+(Infiniband+FDR)�
Managem
ent+n
etwork+
(10/1G
bE)�x+155
x+155
BDX8
x+155
x+5
VMDI+public+SW
RGWRADOS
VMDI+cluster+SW
x+10
x+10
VMDI+storage
Data+network+(10GbE)
NFS
x+2 x+2
VM�
• No shared storage/!filesystem
• VMDI (Virtual Machine Disk Image) storage– RADOS storage cluster
– RADOS gateway
– NFS secondary staging server
• User-attached storage
primary storage!(local SSD)
secondary storage
user data
Outline �• AIST Super Green Cloud (ASGC) and!
HPC Cloud service
• Lessons learned from the first six months of operation– CloudStack on a Supercomputer
– Cloud Service for HPC users– Utilization
• Experiments• Conclusion
14
Overview of ASGC Operation �• The operation started from July 2014.• Accounts: 30+
– Main users are material scientists and genome scientists.
• Utilization: < 70%• 95% of the total usage time is consumed for running
HPC VM instances.• Hardware failures: 19 (memory, M/B, power supply)
CloudStack on Supercomputer �• Supercomputer is not designed for cloud computing.
– Cluster management software is troublesome.
• We can launch a highly productive system in a short development time by leveraging open source system software.
• Software maturity of CloudStack– Our storage architecture is slightly uncommon, that is we
use local SSD disk as primary storage, and S3-compatible object store as secondary storage.
– We discovered and resolved several serious bugs.
Software Maturily�CloudStack Issue � Our action � Status �
cloudstack-agent jsvc gets too large virtual memory space Patch Fixed
listUsageRecords generates NullPointerExceptions for expunging instances
Patch Fixed
Duplicate usage records when listing large number of records /Small page sizes return duplicate results
Backporting Fixed
Public key content is overridden by template's meta data when you create a instance
Bug report Fixed
Migration of a VM with volumes in local storage to another host in the same cluster is failing
Backporting Fixed
Negative ref_cnt of template(snapshot/volume)_store_ref results in out-of-range error in MySQL
Patch (not merged) Fixed
[S3] Parallel deployment makes reference count of a cache in NFS secondary staging store negative(-1)
Patch (not merged) Unresolved
Can't create proper template from VM on S3 secondary storage environment
Patch Fixed
Fails to attach a volume (is made from a snapshot) to a VM with using local storage as primary storage
Bug report Unresolved
Cloud service for HPC users�• SaaS is the best if the target application is clear.• IaaS is quite flexible. However, it is difficult to
manage an HPC environment from scratch for application users.
• To bridge this gap, sgc-tools is introduced on top of an IaaS service.
• We believe it works well, although some minor problems are remained.
• To improve the ability to maintain VM template, the idea of “Infrastructure as code” can help.
Utilization�• Efficient use of limited resources is required.• A virtual cluster dedicates resources whether the
user fully utilizes them or not.
• sgc-tools do not support queuing at system wide, therefore, the users need to check the availability.
• Introducing a global scheduler, e.g., Condor VM universe, can be a solution for this problem.
Outline �• AIST Super Green Cloud (ASGC) and!
HPC Cloud service
• Lessons learned from the first six months of operation• Experiments
– Deployment time
– Performance evaluation of SR-IOV
• Conclusion
20
0 5 10 15 20 25 30 350
200
400
600
800
1000
Number of nodes
Dep
loy
time
[sec
ond]
●●
●
●
●
● no cachecache on NFS/SScache on local node
Virtual Cluster Deployment �
Breakdown (second)�Device attach!(before OS boot)
90
OS boot 90
FS creation (mkfs) 90
transfer from !RADOS to SS
transfer from SS to !local node
VM�
VM�
RADOS
NFS/SS
Compute!node
VM�
Benchmark Programs�Micro benchmark
– Intel Micro Benchmark (IMB) version 3.2.4• Point-to-point• Collectives: Allgather, Allreduce, Alltoall, Bcast, Reduce, Barrier
Application-level benchmark– LAMMPS Molecular Dynamics Simulator version 28 June
2014• EAM benchmark, 100x100x100 atoms
MPI Point-to-point communication�
23
6.00GB/s5.72GB/s5.73GB/s
IMB
Message size [Byte]
Band
widt
h [M
B/se
c]
101 102 103 104 105 10610−1
101
102
103
104
BMPCI passthroughSR−IOV
The overhead is less than 5% with large message,though it is up to 30% with small message.
Message size [Byte]
Exec
utio
n tim
e [m
icros
econ
d]
101 102 103 104 105 106101
102
103
104
105
BMPCI passthroughSR−IOV
MPI Collectives�
Time [microsecond] �BM 6.87 (1.00)
PCI passthrough 8.07 (1.17)
SR-IOV 9.36 (1.36)
Message size [Byte]
Exec
utio
n tim
e [m
icros
econ
d]
101 102 103 104 105 106101
102
103
104
105
BMPCI passthroughSR−IOV
Message size [Byte]
Exec
utio
n tim
e [m
icros
econ
d]
101 102 103 104 105 106101
102
103
104
105
BMPCI passthroughSR−IOV
Message size [Byte]
Exec
utio
n tim
e [m
icros
econ
d]
101 102 103 104 105 106101
102
103
104
BMPCI passthroughSR−IOV
Message size [Byte]
Exec
utio
n tim
e [m
icros
econ
d]
101 102 103 104 105 106101
102
103
104
BMPCI passthroughSR−IOV
Reuce �Bcast �
Allreduce �Allgather � Alltoall �
The performance of SR-IOV is comparable to that of PCI passthrough while unexpected performance degradation is often observed
Barrier �
LAMMPS: MD simulator �• EAM benchmark:
– Fixed problem size (1M atoms)
– #proc: 20 - 160
• VCPU pinning reduces performance fluctuation.
• Performance overhead of PCI passthrough and SR-IOV is about 13 %.
20 40 60 80 100 120 140 1600
10
20
30
40
50
60
Number of processes
Exec
utio
n tim
e [s
econ
d]
●
●
●
●●
● BMPCI passthroughPCI passthrough w/ vCPU pinSR−IOVSR−IOV w/ vCPU pin
Findings�• The performance of SR-IOV is comparable to that of
PCI passthrough while unexpected performance degradation is often observed.
• VCPU pinning improves the performance for HPC applications.
Outline �• AIST Super Green Cloud (ASGC) and!
HPC Cloud service
• Lessons learned from the first six months of operation• Experiments• Conclusion
27
Conclusion and Future work �• ASGC is a fully virtualized HPC system.• We can launch a highly productive system in a short
development time by leveraging start-of-the-art open source system software.– Extension: PCI passthrough/SR-IOV support, sgc-tools– Bug fixes…
• Future research direction: data movement is key.– Efficient data management and transfer methods– Federated identity management
28