Virtual Machine Boot-up Analysis

21
Virtual Machine Boot-up Analysis Abderrahmane Benbachir Dec 12, 2016 École Polytechnique de Montréal Laboratoire DORSAL

Transcript of Virtual Machine Boot-up Analysis

Virtual Machine Boot-up Analysis

Abderrahmane Benbachir�Dec 12, 2016

�École�Polytechnique�de�Montréal Laboratoire�DORSAL

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir (Abder)

Agenda

Introduction Research objectives Investigation Results Future work

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Introduction

Master Boot Record

Boot loader

Kernel

Init process

Run level

BIOS Basic I/O

Execute GRUB

Execute Kernel

Execute /sbin/init, initramfs root

Executed from /etc/rc.d/rc*.d

Start run level

Boot-up process

LTTng, Ftrace, Dtrace, ...

Ftrace partially handle this part

Still a dark place

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Research objectives

•  Analyze virtual machines early boot-up

•  Detect boot up issues

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Investigation

Source : https://s-media-cache-ak0.pinimg.com/736x/99/66/28/996628fa4292de7175d7893ca31d7796.jpg

Where�&�How�to�store�traces�during�bootup�?�

We�already�know�how�to�trace:�LTTng,�Ftrace,�Dtrace,�Perf.�

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Investigation

Offload them to the Host

Keep them inside Guest

•  Recording Timestamp •  Synchronization

•  Buffer size •  Memory or Disk

•  Guest or Host Timestamp ? •  Synchronization

•  Transfer Channels (Network, Shared Memory, Paravirt)

•  Transfer Speed $$$

VS

Where�&�How�to�store�traces�during�boot�up�?�

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Investigation

Offload them to the Host

Keep them inside Guest

Light weight Tracers

Network : Sockets Shared Memory: Qemu Hypertrace Paravirtualization API: Hypercalls

VS

Where�&�How�to�store�traces�during�boot�up�?�

Must be explored

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Qemu Hypertrace

Host�Kernel�

By Lluís Vilanova

Control� Data�channel�Config�

KVM.KO�

VM#1�

Qemu�

Emit�User�space�events�

Shared Memory Btw

Guest & Qemu

Kernel�&�User�space�

Hypertrace�Softmmu�

Read

Write Notify

Wake up

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Hypercall

Host�Kernel�

Qemu�

KVM.KO�

VM#1�User�space�

Paravirtualized�Guest�OS�

Direct Execution of

Guest Requests

Hardware�

This layer is not involved Hyp

erca

ll

Hyp

erca

ll

0.00

0.01

0.02

0.03

0.04

0.05

1950 2000 2050 2100Nanoseconds

Density

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Benchmarks

Nanoseconds

Den

sity

1.99�us�:�writing�to�control�channel�

Qemu Hypertrace Overhead (KVM Enabled)

Mean 2070 ns

Median 1995 ns

Std deviation 1580 ns

0.00

0.05

0.10

0.15

300 320 340 360Nanoseconds

Density

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Benchmarks

Nanoseconds

Den

sity

310�ns�:�kvm_exit�+�kvm_hypercall�+�kvm_entry�

Hypercall Overhead

Mean 340 ns

Median 310 ns

Std deviation 833 ns

323�ns�

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Hypercalls Implementation

•  Hook to Ftrace “function graph” callback for entry & exit •  Only trace the Host •  Use only host Timestamp : No synchronization required •  Dump kernel symbols to map each function names

How to trace through Hypercalls

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Hypercalls Implementation

[13:59:00.035571626] (+0.000000494) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579284592, a1 = 0, a2 = 0, a3 = 0 } [13:59:00.035572670] (+0.000000519) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579288208, a1 = 1, a2 = 4198, a3 = 3 } [13:59:00.035573178] (+0.000000508) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579292656, a1 = 1, a2 = 5227, a3 = 2 } [13:59:00.035573686] (+0.000000508) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579922416, a1 = 1, a2 = 6251, a3 = 1 } [13:59:39.016994097] (+0.000000482) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 2000, a0 = 4197127, a1 = 0, a2 = 0, a3 = 0 } [13:59:00.035584321] (+0.000000531) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579265552, a1 = 0, a2 = 0, a3 = 0 } [13:59:00.035586871] (+0.000000490) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579284528, a1 = 0, a2 = 0, a3 = 0 } [13:59:00.035587383] (+0.000000525) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579284592, a1 = 1, a2 = 40, a3 = 4 } [13:59:39.016994570] (+0.000000473) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 2000, a0 = 4197127, a1 = 1, a2 = 0, a3 = 0 } [13:59:39.016995047] (+0.000000477) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 2000, a0 = 4196150, a1 = 0, a2 = 0, a3 = 0 } [13:59:39.016996002] (+0.000000481) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 2000, a0 = 4196518, a1 = 0, a2 = 0, a3 = 0 } [13:59:00.035585361] (+0.000000503) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579284592, a1 = 0, a2 = 0, a3 = 0 } [13:59:00.035585863] (+0.000000502) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071587317584, a1 = 0, a2 = 0, a3 = 0 } [13:59:39.017128748] (+0.000000734) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 2000, a0 = 4196518, a1 = 1, a2 = 0, a3 = 0 } [13:59:39.017129250] (+0.000000502) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 2000, a0 = 4197127, a1 = 1, a2 = 0, a3 = 0 }

Host Kernel Trace with LTTng

Entry

Exit

Hypercall number

hypercall payload = 5 args x 64 bits = 40 Bytes

hypercall(nr, a0, a1, a2, a3)

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Function graph Chrome Trace-Viewer

Callstacks of Guest Kernel space

300 us

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Hybrid tracing technique Kernel space + User space

0

25

50

75

100

__do

_pag

e_fa

ult

hand

le_m

m_f

ault

filem

ap_m

ap_p

ages

do_a

sync

_pag

e_fa

ult

sys_

nano

slee

pdo

_nan

osle

ephr

timer

_nan

osle

epsy

s_op

endo

_sys

_ope

nsc

hedu

ledo

_filp

_ope

npa

th_o

pena

tde

activ

ate_

task

dequ

eue_

task

_fai

rsy

s_re

addo

_brk

vfs_

read

__do

_fau

ltvf

s_op

ensp

ecia

l_m

appi

ng_f

ault

__pt

e_al

loc

do_d

entry

_ope

nvv

ar_f

ault

sys_

new

fsta

tvm

_ins

ert_

pfn

pte_

allo

c_on

ege

t_em

pty_

filp

allo

c_pa

ges_

vma

vm_i

nser

t_pf

n_pr

othr

timer

_sta

rt_ra

nge_

ns__

vfs_

read

pick

_nex

t_ta

sk_f

air

gene

ric_f

ile_r

ead_

iter

allo

c_pa

ges_

curre

ntde

queu

e_en

tity

vfs_

fsta

tpi

ck_n

ext_

task

_idl

eup

date

_blo

cked

_ave

rage

stra

ck_p

fn_i

nser

tpu

t_pr

ev_t

ask_

fair

vfs_

geta

ttrsm

p_ap

ic_t

imer

_int

erru

ptlo

okup

_mem

type

sys_

brk

__al

loc_

page

s_no

dem

ask

getn

ame

vma_

link

secu

rity_

file_

allo

crw

_ver

ify_a

rea

del_

timer

_syn

cpa

t_pa

gera

nge_

is_r

amge

tnam

e_fla

gsm

ay_o

pen

link_

path

_wal

kto

uch_

atim

esy

s_ls

eek

appa

rmor

_file

_allo

c_se

curit

yse

curit

y_fil

e_pe

rmis

sion

ext4

_file

_ope

ntry

_to_

del_

timer

_syn

cup

date

_cfs

_sha

res

kmem

_cac

he_a

lloc

tick_

prog

ram

_eve

ntwa

lk_s

yste

m_r

am_r

ange

get_

page

_fro

m_f

reel

ist

secu

rity_

file_

open

atim

e_ne

eds_

upda

tege

t_un

map

ped_

area

get_

unus

ed_f

d_fla

gsin

ode_

perm

issi

onkm

em_c

ache

_allo

c_tra

cese

curit

y_vm

_eno

ugh_

mem

ory_

mm

com

plet

e_wa

lkte

rmin

ate_

walk

prep

are_

to_s

wait_

even

tdo

_set

_pte

cloc

keve

nts_

prog

ram

_eve

nt__

vma_

link_

rb__

allo

c_fd

curre

nt_f

s_tim

ein

sert_

pfn.

isra

.71

cp_n

ew_s

tat

__in

ode_

perm

issi

onun

lazy

_wal

kpa

ge_a

dd_n

ew_a

non_

rmap

vfs_

geta

ttr_n

osec

inte

rnal

_add

_tim

erfd

_ins

tall

mem

_cgr

oup_

com

mit_

char

gem

em_c

grou

p_try

_cha

rge

find_

mer

geab

le_a

non_

vma

finis

h_sw

ait

prep

are_

to_s

wait

secu

rity_

inod

e_ge

tattr

page

_add

_file

_rm

apfin

d_ne

xt_i

omem

_res

__no

te_g

p_ch

ange

slo

okup

_fas

tpa

geca

che_

get_

page

putn

ame

ext4

_lls

eek

secu

rity_

mm

ap_a

ddr

irq_e

nter

rcu_

adva

nce_

cbs

__ge

t_lo

cked

_pte

hrtim

er_t

ry_t

o_ca

ncel

find_

vma

__fd

get_

raw

dquo

t_fil

e_op

enle

gitim

ize_m

ntex

t4_g

etat

trim

a_fil

e_ch

eck

__fd

_ins

tall

upda

te_r

q_cl

ock

cap_

vm_e

noug

h_m

emor

yun

lock

_pag

elo

ck_t

imer

_bas

e.is

ra.2

3ap

parm

or_i

node

_get

attr

__fd

get_

pos

dow

n_w

rite_

killa

ble

anon

_vm

a_pr

epar

eap

parm

or_f

ile_p

erm

issi

onup

date

_cur

rdo

wn_

writ

elru

_cac

he_a

dd_a

ctive

_or_

unev

icta

ble

lock

_hrti

mer

_bas

e.is

ra.2

1pi

ck_n

ext_

entit

ypu

t_pr

ev_e

ntity

get_

vma_

polic

y.pa

rt.37

mnt

put

path

_get

lapi

c_ne

xt_d

eadl

ine

finis

h_ta

sk_s

witc

h__

mem

cg_k

mem

_get

_cac

he__

d_lo

okup

_rcu

find_

get_

entry

appa

rmor

_file

_ope

npa

th_i

nit

file_

ra_s

tate

_ini

tcu

rrent

_ker

nel_

time6

4km

em_c

ache

_fre

e__

vm_e

noug

h_m

emor

yvm

_get

_pag

e_pr

otvm

acac

he_f

ind

enqu

eue_

hrtim

erhr

timer

_act

ivecp

uacc

t_ch

arge

_raw

_rea

d_lo

ckup

date

_rq_

cloc

k.pa

rt.82

__en

queu

e_en

tity

ktim

e_ge

t__

hrtim

er_i

nit

rcu_

acce

lera

te_c

bsdg

et_p

aren

tfro

m_k

uid_

mun

ged

get_

mem

_cgr

oup_

from

_mm

vma_

mer

ge__

legi

timize

_mnt

anon

_vm

a_in

terv

al_t

ree_

inse

rtdo

_hug

e_pm

d_an

onym

ous_

page

__vm

a_lin

k_fil

eac

coun

t_en

tity_

dequ

eue

acco

unt_

entit

y_en

queu

ege

neric

_file

_lls

eek_

size

__lo

ck_t

ext_

star

tse

t_ne

xt_e

ntity

__m

emcg

_km

em_p

ut_c

ache

gene

ric_p

erm

issi

onge

neric

_filla

ttrdp

utar

ch_g

et_u

nmap

ped_

area

_top

dow

ntry

_cha

rge

com

mon

_file

_per

mco

mm

on_p

erm

__fg

et_l

ight

reus

able

_ano

n_vm

atry

_mod

ule_

get

mem

_cgr

oup_

char

ge_s

tatis

tics

wake

up_p

reem

pt_e

ntity

.isra

.58

__co

mpu

te_r

unna

ble_

cont

ribfro

m_k

gid_

mun

ged

__lru

_cac

he_a

ddfs

notif

yge

t_ta

sk_p

olic

y.pa

rt.34

switc

h_m

m_i

rqs_

off

__vm

a_lin

k_lis

tm

ay_e

xpan

d_vm

cap_

capa

ble

expa

nd_f

iles

inc_

zone

_pag

e_st

ate

secu

rity_

inod

e_pe

rmis

sion

__pa

ge_s

et_a

non_

rmap

__wa

ke_u

p_bi

t__

mod

_zon

e_pa

ge_s

tate

lock

_pag

e_m

emcg

clea

r_bu

ddie

sge

neric

_file

_ope

npi

ck_n

ext_

task

_rt

__ge

t_vm

a_po

licy

_raw

_spi

n_lo

ck_i

rqsa

vere

stor

e_na

mei

data

__in

tern

al_a

dd_t

imer

__in

c_zo

ne_s

tate

cap_

mm

ap_a

ddr

mar

k_pa

ge_a

cces

sed

pick

_nex

t_ta

sk_d

lwa

ke_u

p_no

hz_c

pu_r

aw_s

pin_

lock

_irq

put_

prev

_tas

k_id

leop

en_c

heck

_o_d

irect

proc

ess_

mea

sure

men

tup

_writ

ein

it_tim

er_k

eyup

_rea

d__

com

pute

_run

nabl

e_co

ntrib

.par

t.59

mnt

put_

no_e

xpire

is_v

ma_

tem

pora

ry_s

tack

__de

c_zo

ne_s

tate

page

rang

e_is

_ram

_cal

lbac

kge

t_no

hz_t

imer

_tar

get

rcu_

note

_con

text

_sw

itch

polic

y_zo

nelis

tid

le_c

pupo

licy_

node

mas

kdo

wn_

read

_try

lock

rcu_

jiffie

s_til

l_st

all_

chec

kpi

ck_n

ext_

task

_sto

pun

lock

_pag

e_m

emcg

mem

_cgr

oup_

from

_tas

khr

tick_

upda

terc

u_irq

_ent

er_c

ond_

resc

hed

_raw

_spi

n_lo

ckch

eck_

cfs_

rq_r

untim

em

ntge

trc

u_irq

_exi

tup

date

_min

_vru

ntim

em

emcg

_che

ck_e

vent

src

u_al

l_qs

__fs

notif

y_pa

rent

vmac

ache

_upd

ate

__m

utex

_ini

tde

tach

_if_

pend

ing

vma_

gap_

callb

acks

_rot

ate

times

pec_

trunc

cpu_

need

s_an

othe

r_gp

Function call

Ove

rhea

d %

Function Graph Overhead (Sorted by median)

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Overhead by function (sorted by median)

0.0

2.5

5.0

7.5

10.0__

do_p

age_

fault

hand

le_mm_fa

ult

filemap

_map

_pag

es

do_a

sync

_pag

e_fau

lt

sys_

nano

sleep

do_n

anos

leep

hrtim

er_na

nosle

epsy

s_op

en

do_s

ys_o

pen

sche

dule

do_fi

lp_op

en

path_

open

at

deac

tivate

_task

sys_

brk

dequ

eue_

task_

fair

sys_

read

do_b

rkvfs

_read

anon

_vma_

prepa

re__

do_fa

ultvfs

_ope

n

spec

ial_m

appin

g_fau

lt__

pte_a

lloc

do_d

entry

_ope

nvv

ar_fau

lt

sys_

newfst

at

vm_in

sert_

pfn

pte_a

lloc_

one

get_e

mpty_fi

lp

alloc

_pag

es_v

ma

vm_in

sert_

pfn_p

rot

hrtim

er_sta

rt_ran

ge_n

s__

vfs_re

ad

pick_

next_

task_

fair

gene

ric_fi

le_rea

d_ite

r

alloc

_pag

es_c

urren

t

dequ

eue_

entity

vfs_fs

tat

pick_

next_

task_

idle

upda

te_blo

cked

_ave

rages

finish

_task

_switc

h

track

_pfn_

insert

put_p

rev_ta

sk_fa

irvfs

_geta

ttr

smp_

apic_

timer_

interr

upt

looku

p_mem

type

do_s

et_pte

__all

oc_p

ages

_nod

emas

kge

tname

vma_

link

put_p

rev_e

ntity

secu

rity_fi

le_all

oc

rw_v

erify_

area

del_t

imer_

sync

pat_p

agera

nge_

is_ram

page

_add

_file_

rmap

Function Graph Overhead (Sorted by median)

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Ove

rhea

d %

90

91

92

93

94

hrtim

er_a

ctive

gene

ric_f

ile_l

lseek

_size

cpua

cct_

char

geup

date

_cfs

_sha

res

__en

queu

e_en

tity

upda

te_r

q_clo

ck.p

art.8

2kt

ime_

get

enqu

eue_

hrtim

erpi

ck_n

ext_

task

_fai

r__

hrtim

er_i

nit

rcu_

acce

lera

te_c

bsdg

et_p

aren

tge

neric

_per

miss

ion

upda

te_c

urr

rcu_

adva

nce_

cbs

from

_kui

d_m

unge

dvm

a_m

erge

get_

mem

_cgr

oup_

from

_mm

__le

gitim

ize_m

nt

anon

_vm

a_in

terv

al_t

ree_

inse

rt

do_h

uge_

pmd_

anon

ymou

s_pa

ge__

vma_

link_

file__

fsno

tify_

pare

nt

acco

unt_

entit

y_en

queu

ere

usab

le_a

non_

vma

wake

up_p

reem

pt_e

ntity

.isra

.58

acco

unt_

entit

y_de

queu

e

upda

te_m

in_v

runt

ime

__lo

ck_t

ext_

star

tse

t_ne

xt_e

ntity

__m

emcg

_km

em_p

ut_c

ache

down

_rea

d_try

lock

gene

ric_f

illattr

dput

__m

od_z

one_

page

_sta

te

arch

_get

_unm

appe

d_ar

ea_t

opdo

wntry

_cha

rge

com

mon

_file

_per

mco

mm

on_p

erm

__fg

et_l

ight

try_m

odul

e_ge

t

get_

task

_pol

icy.p

art.3

4

mem

_cgr

oup_

char

ge_s

tatis

tics

from

_kgi

d_m

unge

d__

lru_c

ache

_add

__co

mpu

te_r

unna

ble_c

ontri

bvm

acac

he_f

ind

fsno

tify

idle

_cpu

up_w

rite

finish

_tas

k_sw

itch

switc

h_m

m_i

rqs_

off

__pa

ge_s

et_a

non_

rmap

__de

c_zo

ne_s

tate

__vm

a_lin

k_lis

tm

ay_e

xpan

d_vm

rcu_

note

_con

text

_swi

tch

up_r

ead

rcu_

irq_e

xit

__m

emcg

_km

em_g

et_c

ache

cap_

capa

bleex

pand

_file

spi

ck_n

ext_

task

_dl

pick

_nex

t_ta

sk_r

t__

wake

_up_

bit

_con

d_re

sche

d_r

aw_s

pin_

lock

clear

_bud

dies

inc_

zone

_pag

e_st

ate

put_

prev

_tas

k_id

le

secu

rity_

inod

e_pe

rmiss

ion

unlo

ck_p

age_

mem

cg

_raw

_spi

n_lo

ck_i

rqsa

ve_r

aw_s

pin_

lock

_irq

lock

_pag

e_m

emcg

__co

mpu

te_r

unna

ble_c

ontri

b.pa

rt.59

pick

_nex

t_ta

sk_s

top

get_

nohz

_tim

er_t

arge

trc

u_irq

_ent

er

chec

k_cf

s_rq

_run

time

__in

c_zo

ne_s

tate

hrtic

k_up

date

Function Graph Overhead (Sorted by median)

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Ove

rhea

d %

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Future Work

•  Apply hypercall technique to trace full kernel boot up

•  Explore this solution with nested virtual machine

•  Can we trace the Boot loader too ?

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

Questions

[email protected] https://github.com/abenbachir

Thanks�to�Professor�Michel�Dagenais�and�our�partners�EfficiOS�and�Ericsson.�

POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir

References

Hypercall Implementation : https://gist.github.com/abenbachir/344822b5ba9fc5ac384cdec3f087e018

QEMU Hypertrace Patches: http://patchwork.ozlabs.org/project/qemu-devel/list/?state=&q=Hypertrace&archive