Live migration: pros, cons and gotchas -- Pavel Emelyanov

25
Live migrating a container: pros, cons and gotchas Live migrating a container: pros, cons and gotchas Павел Емельянов Linux-Piter, Санкт-Петербург, 2015

Transcript of Live migration: pros, cons and gotchas -- Pavel Emelyanov

Page 1: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Live migrating a container:

pros, cons and gotchas

Live migrating a container:

pros, cons and gotchas

Павел Емельянов

Linux-Piter, Санкт-Петербург, 2015

Page 2: Live migration: pros, cons and gotchas -- Pavel Emelyanov

AgendaAgenda

• Why you might want to live migrate a container

• Why (and how) to avoid live migration

• Why is container live migration so complex

2

Page 3: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Migration in a nutshelMigration in a nutshel

• Get object state

• Send state information

• Restore object copy from state

3

Page 4: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Why you might want to live migrate a containerWhy you might want to live migrate a container

• Load balancing

• Updating kernel

– Can avoid live migration, just C/R

• Updaring or replacing hardware

• Spectacular

4

Page 5: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Why to avoid live migrationWhy to avoid live migration

5

Page 6: Live migration: pros, cons and gotchas -- Pavel Emelyanov

How to avoid live migrationHow to avoid live migration

• Balance reason for load, e.g. network traffic

• Microservices architecture

• Crash-driven updates

• Planned downtime

6

Page 7: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Making live migration liveMaking live migration live

• State saving, transfering and restoring happens with tasks frozen

– “Scatter” container is too complex

• Save & Restore state quicker and quicker

• Avoid transfering big amount of data

7

Page 8: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Making live migration liveMaking live migration live

• Save/restore optimizations is a long-running task

– “Dirty” bits

– Less system calls

– O(1) algorythms

• Big memory transfer can be easily moved out of the frozen period

– Memory pre-copy

– Memory post-copy

8

Page 9: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Pre-copyPre-copy

• Copy memory while tasks are running

• Track memory changes

• goto again

9

Page 10: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Pre-copyPre-copy

• Pros:

– Safe: once migrated, source node can disappear

• Cons:

– Unpredictable: iterations may take long

– Non-guaranteed: “dirty” memory next round may remain big

10

Page 11: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Post-copyPost-copy

• Migrate all but memory

• Turn on “network swap” on destination

11

Page 12: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Post-copyPost-copy

Pros:

– Predictable: time to migrate can be well estimated

• Cons:

– Unsafe: src node death means death of container on destination

– Application slows down after migration

12

Page 13: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Live migration at lengthLive migration at length

• Memory pre-copy (iteratively, optional)

• Freeze + Save state

• Copy state

• Restore from state + Unfreeze and resume

• Memory post-copy (optional)

13

Page 14: Live migration: pros, cons and gotchas -- Pavel Emelyanov

GotchasGotchas

14

VS

Page 15: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Things to deal withThings to deal with

• VM

– Environment: virtual hardware, paravirt

– CPU

– Memory

• Container

– Environment: cgroups, namespaces

– Processes and other animals

– Memory

15

Page 16: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Memory pre-copyMemory pre-copy

• VM

– All memory at hands

– Plain address space

• Container

– Memory

● is scatered over the processes

● can be (or can be not) shared

● can be (or can be not) mapped to disk files

16

Page 17: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Save stateSave state

• VM

– Hardware state

● Tree of ~100 objects

● Fixed amount of data per each

• Container

– State of all objects

● Graph of up to ~1000 objects

● All have different amount of data, different reading API

17

Page 18: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Restore from stateRestore from state

• VM

– Copy memory in place, write state into devices

• Container

– Creation of many small objects

– Not all have sane API for creation

● Creation sequence can be non-trivial

18

Page 19: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Memory post-copyMemory post-copy

• UserfaultFD from Andrea Archangeli

• VM

– Merged into 4.2

• Container

– Non-cooperative work of uffd monitor and client,

need further patching

19

Page 20: Live migration: pros, cons and gotchas -- Pavel Emelyanov

And we also need this, this and this!And we also need this, this and this!

• Check for CPUs compatibility

• Check and load necessary kernel modules (iptables, filesystems)

• Non-shared filesystem should be copied

• Roll-back on source node if something fails in between

– Keep tasks frozen after dump, kill after restore

20

Page 21: Live migration: pros, cons and gotchas -- Pavel Emelyanov

ImplementationImplementation

• CRIU: performance critical and low-level actions

– Save & restore state

– Memory pre/post copy

• P.Haul: puts everything together

– Checks

– Orchestrate all C/R steps

– Deal with filesystem

21

Page 22: Live migration: pros, cons and gotchas -- Pavel Emelyanov

P.Haul goalsP.Haul goals

• Provide containers live miration engine using CRIU for OpenVZ/Docker/LXC/whatever

• Perform necessary pre-checks (e.g. CPU compatibility)

• Organize memory pre-copy and/or post-copy

• Take care of file-system migration (if needed)

22

Page 23: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Under the hood (top-level overview)Under the hood (top-level overview)

23

CRIU CRIUp.haul p.hauldocker -d docker -dmigrate

src dst

check (CPUs, kernels)

pre-dumpmemory

dump

other images

restore

memorylazy mem

FS

FS copy

done

pre-cop ypost-co py

kill

freezetime

Page 24: Live migration: pros, cons and gotchas -- Pavel Emelyanov

More infoMore info

• http://criu.org

• http://criu.org/P.Haul

[email protected]

• +CriuOrg / @__criu__

• https://github.com/xemul/(criu|p.haul)

24

Page 25: Live migration: pros, cons and gotchas -- Pavel Emelyanov

Thank you!

[email protected]

Thank you!

[email protected]