Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data Centers

download Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data Centers

of 23

  • date post

    29-Aug-2014
  • Category

    Technology

  • view

    1.276
  • download

    0

Embed Size (px)

description

High-Performance Grid and Cloud Computing Workshop

Transcript of Ninja Migration: An Interconnect transparent Migration for Heterogeneous Data Centers

  • High%Performance/Grid/and/Cloud/Compu6ng/Workshop,/May/20/2013,/BostonRyousei/Takano,/Hidemoto/Nakada,/Takahiro/Hirofuchi,//Yoshio/Tanaka,/and/Tomohiro/Kudoh//Informa(on)Technology)Research)Ins(tute,))Na(onal)Ins(tute)of)Advanced)Industrial)Science)and)Technology)(AIST),)JapanNinja&Migra*on:&An&Interconnect2transparent&Migra*on&for&Heterogeneous&Data&Centers
  • Background HPC/cloud/is/a/promising/HPC/plaIorm./ VM/migra6on/is/useful/for/improving/exibility/and/maintainability/in/cloud/compu6ng./VM1 VM2 VM3VM1 VM2VM3Maintenance,/fault/tolerance,/energy/ecient/VM/placementDisaster/recovery/VM1 VM2 VM3
  • InfinibandConstraints&on&VM&migra*on Migra6on/with/a/VMM%bypass/I/O/device/ It/can/greatly/reduce/the/overhead/of/virtualiza6on,/but/it/is/not/under/the/control/of/a/VMM./InfinibandVM1 VM2 VM3
  • Impact&of&VMM2bypass&I/O050100150200250300BT CG EP FT LUExecutiontime[seconds]BMM (IB) BMM (10GbE)KVM (IB) KVM (virtio)The/overhead/of/I/O/virtualiza6on/on/the/NAS/Parallel/Benchmarks/3.3.1/class/C,/64/processes.BMM: Bare Metal MachineKVM (virtio)VM110GbE NICVMMGuestdriverPhysicaldriverGuest OSKVM (IB)VM1IB QDR HCAVMMPhysicaldriverGuest OS Performance/evalua6on/of/HPC/cloud/ (Para2)virtualized&I/O&incurs/a/large/overhead./ PCI&passthrough&signicantly/mi6gate/the/overhead./
  • Constraints&on&VM&migra*on Migra6on/with/a/VMM%bypass/I/O/device/ It/can/greatly/reduce/the/overhead/of/virtualiza6on,/but/it/is/not/under/the/control/of/a/VMM./ Heterogeneity/of/interconnect/devices/ A/VM/assigned/to/an/Inniband/device/cannot/migrate/to/an/Ethernet/machine./Infiniband EthernetVM1 VM2 VM3
  • Challenge Goal:/Migrate/a/cluster/of/VMs/between/heterogeneous/data/centers./!Interconnect%transparent/migra6on/ Challenge:/How/do/we/realize/it/with/the/minimal/overhead/of/virtualiza6on/during/normal/opera6on?/ (Para%)/Virtualized/devices/suer/from/the/overhead./
  • Outline Introduc6on/ Ninja/migra6on:/interconnect%transparent/migra6on/ Experiment/ Conclusion
  • Interconnect2transparent&migra*onInnibandNormal/opera6onVM1 VM2 VM3 Fallback/migra6onRecovery/migra6onEthernetFallback/opera6onVM1 VM2 VM3VM1VM2VM3VM1VM2VM3Inniband/cluster Ethernet/clusterUse&cases:/transparent/fail%over/to/another/cluster/for/maintenance,//evacua6on/from/a/disaster%stricken/data/center,/etc.
  • Requirements1. Detach/VMM%bypass/I/O/devices/only/when/VM/migra6on/is/required./2. Global/Coordina6on/among/distributed/VMs/before/migra6on./3. Change/an/applica6ons/transport/protocol//for/the/available/device/ager/migra6on.//Our/approach:/leverage/the/knowledge/of/an/applica6on/to/ensure/coopera6on/between/migra6on/and/a/communica6on/layer/inside/the/guest/OS./
  • VMSymVirt:&Symbio*c&Virtualiza*onNICVMMApplica6onMigra6onGlobal/coordina6onDevice/setupExis*ng&VM&migra*on&(Black%box/approach)/Pro:/portabilityVM&migra*on&w/&SymVirt&(Gray%box/approach)/Pro:/performanceNICVMVMMApplica6onVMM#bypass)I/OMigra6onGlobal/coordina6onDevice/setupCoopera1on
  • Ninja&migra*onNICVMVMMApplica6onMPI/systemNICVMVMMApplica6onMPI/systemNICVMVMMApplica6onMPI/system///Ninja/migra6onMigra6onGlobal/coordina6onDevice/setupIn/conjunc6on/with/VM/migra6on,//MPI/system/is/in/charge/of:/ global/coordina6on/among/MPI/processes/ changing/a/transport/protocol
  • Implementa*onconfirmdetachGuest OS modeVMM modemigration re-attachconfirmApplicationSymVirt coordinator(SELF component)SymVirt controller/agentlinkupMPI runtime No/modica6on/to/either/of/the/MPI/system/and/applica6ons/ Open/MPI/user%level/checkpoint/restart/framework/(SELF)/ Global/coordina6on/protocol/ Re%establishes/connec6ons/among/MPI/processes/ager/migra6on/Migra6onDevice/setupGlobal/coordina6on
  • Outline Introduc6on/ Ninja/migra6on:/interconnect%transparent/migra6on/ Experiment/ Conclusion
  • Experiment The/overhead/of/Ninja/migra6on/ We/used/8/VMs/on/a/cluster./ We/migrated/VMs/once/during/a/benchmark/execu6on./ Fallback/and/recovery/migra6on/between/an/Inniband/cluster/and/an/Ethernet/cluster/ Inniband:/VMM%bypass/I/O/(PCI/passthrough)/ Ethernet:/Para%virtualized/I/O/(vir6o_net)/ Two/benchmark/programs/wriken/in/MPI/ memtest:/a/simple/memory/intensive/benchmark/ NAS/Parallel/Benchmarks/(NPB)/version/3.3.1/
  • Experimental&SeRngWe/used/a/16/node/Inniband/cluster.Blade&server& Dell&PowerEdge&M610CPU Intel/quad%core/Xeon/E5540/2.53GHz/x2Chipset Intel/5520Memory 48/GB/DDR3InniBand/ Mellanox/ConnectX/(MT26428)10/GbE/ Broadcom/NetXtreme/II/(BMC57711)Blade&switchInniBand Mellanox/M3601Q/(QDR/16/ports)10GbE Dell/M8024Host&machine&environmentOS Debian/7.0Linux/kernel 3.2.18QEMU/KVM 1.1%rc3MPI Open/MPI/1.6OFED 1.5.4.1Compiler gcc/gfortran/4.4.6VM&environmentVCPU 8Memory 20/GB
  • Result:&memtest The/overhead/of/Ninja/migra6on/ The/migra6on/6me/depends/on/the/memory/footprint./ Both/hotplug/and/link%up/6mes/are/almost/constant./28.5 28.5 28.5 28.614.6 13.5 12.5 11.335.9 38.7 44.2 53.70204060801002GB 4GB 8GB 16GBExecu*on&Time&[Seconds]migra6on/ hotplug/ linkup/memory&footprint
  • Result:&link2up&*mesrc.&device&2>&dest.&device hotplug link2upInniband/%>/Inniband 3.88 29.9Ethernet/%>/Inniband 1.15 29.8Ethernet/%>/Ethernet 0.13 0.00Inniband/%>/Ethernet 2.80 0.00 Focus/on/the/link%up/6me./ Note:/the/source/and/the/des6na6on/are/the/same/node./ If/the/des6na6on/has/an/Inniband/device,/the//link%up/6me/is/not/a/negligible/overhead./[seconds]
  • Result:&NPB&(64&proc.,&Class&D)020040060080010001200baseline migration baseline migration baseline migration baseline migrationBT CG FT LUExecutiontime[seconds]migra6on/ hotplug/ linkup/ applica6on/BT CG FT LU4417 3394 15678 2348Transferred&Memory&Size&during&VM&Migra*on&[MB]There/is/no/overhead/during/normal/opera6ons The/overhead/is/propor6onal/to/the/memory/footprint.+8%+14%+37%+11%
  • Fallback/Recovery&migra*on:&memtest0204060801001 11 21 31Executiontime[seconds]StepsOverhead/Applica6on/4/hosts//(IB)2/hosts//(TCP)4/hosts//(IB)4/hosts//(TCP)040801201602001 11 21 31Executiontime[seconds]StepsOverhead/Applica6on/4/hosts//(IB)2/hosts//(TCP)4/hosts//(IB)4/hosts//(TCP)Total&4&proc.&(1&proc.&/&VM) Total&32&proc.&(8&proc.&/&VM)
  • Outline Introduc6on/ Ninja/migra6on:/interconnect%transparent/migra6on/ Experiment/ Conclusion
  • Related&Work Heterogeneous/VM/migra6on/ Vagrant/supports/a/live/migra6on/across/heterogeneous/VMM./[P./Liu/08]/ Ninja/migra6on/provides/interconnect%transparent/migra6on./ VM/migra6on/with/VMM%bypass/I/O/devices/ Driver2level:/shadow/driver/[A./Kadav/09],//Nomad/[W./Huang/07]/ Run*me2level:/Ninja/migra6on
  • Conclusion We/propose/an/interconnect%transparent/migra6on/mechanism/to/migrate/a/bunch/of/VMs/between/heterogeneous/data/centers./ We/demonstrate/the/implementa6on/called/Ninja/migra6on./ VMs/can/migrate/between/an/IB/cluster/and/an/Ethernet/cluster/without/restar6ng/an/applica6on. It/has/no/performance/overhead/during/normal/opera6ons./
  • Future&work Demonstrate/the/scalability./ Inves6gate/the/very/long/link%up/6me/of/Inniband./ Design/a/run6me%agnos6c/(MPI/free)/implementa6on./This/work/was/partly/supported/by/JSPS/KAKENHI//Grant/Number/24700040.