Virtual Machine Boot-up Analysis
Transcript of Virtual Machine Boot-up Analysis
Virtual Machine Boot-up Analysis
Abderrahmane Benbachir�Dec 12, 2016
�École�Polytechnique�de�Montréal Laboratoire�DORSAL
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir (Abder)
Agenda
Introduction Research objectives Investigation Results Future work
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Introduction
Master Boot Record
Boot loader
Kernel
Init process
Run level
BIOS Basic I/O
Execute GRUB
Execute Kernel
Execute /sbin/init, initramfs root
Executed from /etc/rc.d/rc*.d
Start run level
Boot-up process
LTTng, Ftrace, Dtrace, ...
Ftrace partially handle this part
Still a dark place
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Research objectives
• Analyze virtual machines early boot-up
• Detect boot up issues
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Investigation
Source : https://s-media-cache-ak0.pinimg.com/736x/99/66/28/996628fa4292de7175d7893ca31d7796.jpg
Where�&�How�to�store�traces�during�bootup�?�
We�already�know�how�to�trace:�LTTng,�Ftrace,�Dtrace,�Perf.�
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Investigation
Offload them to the Host
Keep them inside Guest
• Recording Timestamp • Synchronization
• Buffer size • Memory or Disk
• Guest or Host Timestamp ? • Synchronization
• Transfer Channels (Network, Shared Memory, Paravirt)
• Transfer Speed $$$
VS
Where�&�How�to�store�traces�during�boot�up�?�
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Investigation
Offload them to the Host
Keep them inside Guest
Light weight Tracers
Network : Sockets Shared Memory: Qemu Hypertrace Paravirtualization API: Hypercalls
VS
Where�&�How�to�store�traces�during�boot�up�?�
Must be explored
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Qemu Hypertrace
Host�Kernel�
By Lluís Vilanova
Control� Data�channel�Config�
KVM.KO�
VM#1�
Qemu�
Emit�User�space�events�
Shared Memory Btw
Guest & Qemu
Kernel�&�User�space�
Hypertrace�Softmmu�
Read
Write Notify
Wake up
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Hypercall
Host�Kernel�
Qemu�
KVM.KO�
VM#1�User�space�
Paravirtualized�Guest�OS�
Direct Execution of
Guest Requests
Hardware�
This layer is not involved Hyp
erca
ll
Hyp
erca
ll
0.00
0.01
0.02
0.03
0.04
0.05
1950 2000 2050 2100Nanoseconds
Density
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Benchmarks
Nanoseconds
Den
sity
1.99�us�:�writing�to�control�channel�
Qemu Hypertrace Overhead (KVM Enabled)
Mean 2070 ns
Median 1995 ns
Std deviation 1580 ns
0.00
0.05
0.10
0.15
300 320 340 360Nanoseconds
Density
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Benchmarks
Nanoseconds
Den
sity
310�ns�:�kvm_exit�+�kvm_hypercall�+�kvm_entry�
Hypercall Overhead
Mean 340 ns
Median 310 ns
Std deviation 833 ns
323�ns�
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Hypercalls Implementation
• Hook to Ftrace “function graph” callback for entry & exit • Only trace the Host • Use only host Timestamp : No synchronization required • Dump kernel symbols to map each function names
How to trace through Hypercalls
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Hypercalls Implementation
[13:59:00.035571626] (+0.000000494) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579284592, a1 = 0, a2 = 0, a3 = 0 } [13:59:00.035572670] (+0.000000519) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579288208, a1 = 1, a2 = 4198, a3 = 3 } [13:59:00.035573178] (+0.000000508) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579292656, a1 = 1, a2 = 5227, a3 = 2 } [13:59:00.035573686] (+0.000000508) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579922416, a1 = 1, a2 = 6251, a3 = 1 } [13:59:39.016994097] (+0.000000482) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 2000, a0 = 4197127, a1 = 0, a2 = 0, a3 = 0 } [13:59:00.035584321] (+0.000000531) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579265552, a1 = 0, a2 = 0, a3 = 0 } [13:59:00.035586871] (+0.000000490) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579284528, a1 = 0, a2 = 0, a3 = 0 } [13:59:00.035587383] (+0.000000525) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579284592, a1 = 1, a2 = 40, a3 = 4 } [13:59:39.016994570] (+0.000000473) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 2000, a0 = 4197127, a1 = 1, a2 = 0, a3 = 0 } [13:59:39.016995047] (+0.000000477) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 2000, a0 = 4196150, a1 = 0, a2 = 0, a3 = 0 } [13:59:39.016996002] (+0.000000481) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 2000, a0 = 4196518, a1 = 0, a2 = 0, a3 = 0 } [13:59:00.035585361] (+0.000000503) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071579284592, a1 = 0, a2 = 0, a3 = 0 } [13:59:00.035585863] (+0.000000502) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 1000, a0 = 18446744071587317584, a1 = 0, a2 = 0, a3 = 0 } [13:59:39.017128748] (+0.000000734) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 2000, a0 = 4196518, a1 = 1, a2 = 0, a3 = 0 } [13:59:39.017129250] (+0.000000502) abder-pc kvm_x86_hypercall: { cpu_id = 3 }, { nr = 2000, a0 = 4197127, a1 = 1, a2 = 0, a3 = 0 }
Host Kernel Trace with LTTng
Entry
Exit
Hypercall number
hypercall payload = 5 args x 64 bits = 40 Bytes
hypercall(nr, a0, a1, a2, a3)
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Function graph Chrome Trace-Viewer
Callstacks of Guest Kernel space
300 us
0
25
50
75
100
__do
_pag
e_fa
ult
hand
le_m
m_f
ault
filem
ap_m
ap_p
ages
do_a
sync
_pag
e_fa
ult
sys_
nano
slee
pdo
_nan
osle
ephr
timer
_nan
osle
epsy
s_op
endo
_sys
_ope
nsc
hedu
ledo
_filp
_ope
npa
th_o
pena
tde
activ
ate_
task
dequ
eue_
task
_fai
rsy
s_re
addo
_brk
vfs_
read
__do
_fau
ltvf
s_op
ensp
ecia
l_m
appi
ng_f
ault
__pt
e_al
loc
do_d
entry
_ope
nvv
ar_f
ault
sys_
new
fsta
tvm
_ins
ert_
pfn
pte_
allo
c_on
ege
t_em
pty_
filp
allo
c_pa
ges_
vma
vm_i
nser
t_pf
n_pr
othr
timer
_sta
rt_ra
nge_
ns__
vfs_
read
pick
_nex
t_ta
sk_f
air
gene
ric_f
ile_r
ead_
iter
allo
c_pa
ges_
curre
ntde
queu
e_en
tity
vfs_
fsta
tpi
ck_n
ext_
task
_idl
eup
date
_blo
cked
_ave
rage
stra
ck_p
fn_i
nser
tpu
t_pr
ev_t
ask_
fair
vfs_
geta
ttrsm
p_ap
ic_t
imer
_int
erru
ptlo
okup
_mem
type
sys_
brk
__al
loc_
page
s_no
dem
ask
getn
ame
vma_
link
secu
rity_
file_
allo
crw
_ver
ify_a
rea
del_
timer
_syn
cpa
t_pa
gera
nge_
is_r
amge
tnam
e_fla
gsm
ay_o
pen
link_
path
_wal
kto
uch_
atim
esy
s_ls
eek
appa
rmor
_file
_allo
c_se
curit
yse
curit
y_fil
e_pe
rmis
sion
ext4
_file
_ope
ntry
_to_
del_
timer
_syn
cup
date
_cfs
_sha
res
kmem
_cac
he_a
lloc
tick_
prog
ram
_eve
ntwa
lk_s
yste
m_r
am_r
ange
get_
page
_fro
m_f
reel
ist
secu
rity_
file_
open
atim
e_ne
eds_
upda
tege
t_un
map
ped_
area
get_
unus
ed_f
d_fla
gsin
ode_
perm
issi
onkm
em_c
ache
_allo
c_tra
cese
curit
y_vm
_eno
ugh_
mem
ory_
mm
com
plet
e_wa
lkte
rmin
ate_
walk
prep
are_
to_s
wait_
even
tdo
_set
_pte
cloc
keve
nts_
prog
ram
_eve
nt__
vma_
link_
rb__
allo
c_fd
curre
nt_f
s_tim
ein
sert_
pfn.
isra
.71
cp_n
ew_s
tat
__in
ode_
perm
issi
onun
lazy
_wal
kpa
ge_a
dd_n
ew_a
non_
rmap
vfs_
geta
ttr_n
osec
inte
rnal
_add
_tim
erfd
_ins
tall
mem
_cgr
oup_
com
mit_
char
gem
em_c
grou
p_try
_cha
rge
find_
mer
geab
le_a
non_
vma
finis
h_sw
ait
prep
are_
to_s
wait
secu
rity_
inod
e_ge
tattr
page
_add
_file
_rm
apfin
d_ne
xt_i
omem
_res
__no
te_g
p_ch
ange
slo
okup
_fas
tpa
geca
che_
get_
page
putn
ame
ext4
_lls
eek
secu
rity_
mm
ap_a
ddr
irq_e
nter
rcu_
adva
nce_
cbs
__ge
t_lo
cked
_pte
hrtim
er_t
ry_t
o_ca
ncel
find_
vma
__fd
get_
raw
dquo
t_fil
e_op
enle
gitim
ize_m
ntex
t4_g
etat
trim
a_fil
e_ch
eck
__fd
_ins
tall
upda
te_r
q_cl
ock
cap_
vm_e
noug
h_m
emor
yun
lock
_pag
elo
ck_t
imer
_bas
e.is
ra.2
3ap
parm
or_i
node
_get
attr
__fd
get_
pos
dow
n_w
rite_
killa
ble
anon
_vm
a_pr
epar
eap
parm
or_f
ile_p
erm
issi
onup
date
_cur
rdo
wn_
writ
elru
_cac
he_a
dd_a
ctive
_or_
unev
icta
ble
lock
_hrti
mer
_bas
e.is
ra.2
1pi
ck_n
ext_
entit
ypu
t_pr
ev_e
ntity
get_
vma_
polic
y.pa
rt.37
mnt
put
path
_get
lapi
c_ne
xt_d
eadl
ine
finis
h_ta
sk_s
witc
h__
mem
cg_k
mem
_get
_cac
he__
d_lo
okup
_rcu
find_
get_
entry
appa
rmor
_file
_ope
npa
th_i
nit
file_
ra_s
tate
_ini
tcu
rrent
_ker
nel_
time6
4km
em_c
ache
_fre
e__
vm_e
noug
h_m
emor
yvm
_get
_pag
e_pr
otvm
acac
he_f
ind
enqu
eue_
hrtim
erhr
timer
_act
ivecp
uacc
t_ch
arge
_raw
_rea
d_lo
ckup
date
_rq_
cloc
k.pa
rt.82
__en
queu
e_en
tity
ktim
e_ge
t__
hrtim
er_i
nit
rcu_
acce
lera
te_c
bsdg
et_p
aren
tfro
m_k
uid_
mun
ged
get_
mem
_cgr
oup_
from
_mm
vma_
mer
ge__
legi
timize
_mnt
anon
_vm
a_in
terv
al_t
ree_
inse
rtdo
_hug
e_pm
d_an
onym
ous_
page
__vm
a_lin
k_fil
eac
coun
t_en
tity_
dequ
eue
acco
unt_
entit
y_en
queu
ege
neric
_file
_lls
eek_
size
__lo
ck_t
ext_
star
tse
t_ne
xt_e
ntity
__m
emcg
_km
em_p
ut_c
ache
gene
ric_p
erm
issi
onge
neric
_filla
ttrdp
utar
ch_g
et_u
nmap
ped_
area
_top
dow
ntry
_cha
rge
com
mon
_file
_per
mco
mm
on_p
erm
__fg
et_l
ight
reus
able
_ano
n_vm
atry
_mod
ule_
get
mem
_cgr
oup_
char
ge_s
tatis
tics
wake
up_p
reem
pt_e
ntity
.isra
.58
__co
mpu
te_r
unna
ble_
cont
ribfro
m_k
gid_
mun
ged
__lru
_cac
he_a
ddfs
notif
yge
t_ta
sk_p
olic
y.pa
rt.34
switc
h_m
m_i
rqs_
off
__vm
a_lin
k_lis
tm
ay_e
xpan
d_vm
cap_
capa
ble
expa
nd_f
iles
inc_
zone
_pag
e_st
ate
secu
rity_
inod
e_pe
rmis
sion
__pa
ge_s
et_a
non_
rmap
__wa
ke_u
p_bi
t__
mod
_zon
e_pa
ge_s
tate
lock
_pag
e_m
emcg
clea
r_bu
ddie
sge
neric
_file
_ope
npi
ck_n
ext_
task
_rt
__ge
t_vm
a_po
licy
_raw
_spi
n_lo
ck_i
rqsa
vere
stor
e_na
mei
data
__in
tern
al_a
dd_t
imer
__in
c_zo
ne_s
tate
cap_
mm
ap_a
ddr
mar
k_pa
ge_a
cces
sed
pick
_nex
t_ta
sk_d
lwa
ke_u
p_no
hz_c
pu_r
aw_s
pin_
lock
_irq
put_
prev
_tas
k_id
leop
en_c
heck
_o_d
irect
proc
ess_
mea
sure
men
tup
_writ
ein
it_tim
er_k
eyup
_rea
d__
com
pute
_run
nabl
e_co
ntrib
.par
t.59
mnt
put_
no_e
xpire
is_v
ma_
tem
pora
ry_s
tack
__de
c_zo
ne_s
tate
page
rang
e_is
_ram
_cal
lbac
kge
t_no
hz_t
imer
_tar
get
rcu_
note
_con
text
_sw
itch
polic
y_zo
nelis
tid
le_c
pupo
licy_
node
mas
kdo
wn_
read
_try
lock
rcu_
jiffie
s_til
l_st
all_
chec
kpi
ck_n
ext_
task
_sto
pun
lock
_pag
e_m
emcg
mem
_cgr
oup_
from
_tas
khr
tick_
upda
terc
u_irq
_ent
er_c
ond_
resc
hed
_raw
_spi
n_lo
ckch
eck_
cfs_
rq_r
untim
em
ntge
trc
u_irq
_exi
tup
date
_min
_vru
ntim
em
emcg
_che
ck_e
vent
src
u_al
l_qs
__fs
notif
y_pa
rent
vmac
ache
_upd
ate
__m
utex
_ini
tde
tach
_if_
pend
ing
vma_
gap_
callb
acks
_rot
ate
times
pec_
trunc
cpu_
need
s_an
othe
r_gp
Function call
Ove
rhea
d %
Function Graph Overhead (Sorted by median)
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Overhead by function (sorted by median)
0.0
2.5
5.0
7.5
10.0__
do_p
age_
fault
hand
le_mm_fa
ult
filemap
_map
_pag
es
do_a
sync
_pag
e_fau
lt
sys_
nano
sleep
do_n
anos
leep
hrtim
er_na
nosle
epsy
s_op
en
do_s
ys_o
pen
sche
dule
do_fi
lp_op
en
path_
open
at
deac
tivate
_task
sys_
brk
dequ
eue_
task_
fair
sys_
read
do_b
rkvfs
_read
anon
_vma_
prepa
re__
do_fa
ultvfs
_ope
n
spec
ial_m
appin
g_fau
lt__
pte_a
lloc
do_d
entry
_ope
nvv
ar_fau
lt
sys_
newfst
at
vm_in
sert_
pfn
pte_a
lloc_
one
get_e
mpty_fi
lp
alloc
_pag
es_v
ma
vm_in
sert_
pfn_p
rot
hrtim
er_sta
rt_ran
ge_n
s__
vfs_re
ad
pick_
next_
task_
fair
gene
ric_fi
le_rea
d_ite
r
alloc
_pag
es_c
urren
t
dequ
eue_
entity
vfs_fs
tat
pick_
next_
task_
idle
upda
te_blo
cked
_ave
rages
finish
_task
_switc
h
track
_pfn_
insert
put_p
rev_ta
sk_fa
irvfs
_geta
ttr
smp_
apic_
timer_
interr
upt
looku
p_mem
type
do_s
et_pte
__all
oc_p
ages
_nod
emas
kge
tname
vma_
link
put_p
rev_e
ntity
secu
rity_fi
le_all
oc
rw_v
erify_
area
del_t
imer_
sync
pat_p
agera
nge_
is_ram
page
_add
_file_
rmap
Function Graph Overhead (Sorted by median)
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Ove
rhea
d %
90
91
92
93
94
hrtim
er_a
ctive
gene
ric_f
ile_l
lseek
_size
cpua
cct_
char
geup
date
_cfs
_sha
res
__en
queu
e_en
tity
upda
te_r
q_clo
ck.p
art.8
2kt
ime_
get
enqu
eue_
hrtim
erpi
ck_n
ext_
task
_fai
r__
hrtim
er_i
nit
rcu_
acce
lera
te_c
bsdg
et_p
aren
tge
neric
_per
miss
ion
upda
te_c
urr
rcu_
adva
nce_
cbs
from
_kui
d_m
unge
dvm
a_m
erge
get_
mem
_cgr
oup_
from
_mm
__le
gitim
ize_m
nt
anon
_vm
a_in
terv
al_t
ree_
inse
rt
do_h
uge_
pmd_
anon
ymou
s_pa
ge__
vma_
link_
file__
fsno
tify_
pare
nt
acco
unt_
entit
y_en
queu
ere
usab
le_a
non_
vma
wake
up_p
reem
pt_e
ntity
.isra
.58
acco
unt_
entit
y_de
queu
e
upda
te_m
in_v
runt
ime
__lo
ck_t
ext_
star
tse
t_ne
xt_e
ntity
__m
emcg
_km
em_p
ut_c
ache
down
_rea
d_try
lock
gene
ric_f
illattr
dput
__m
od_z
one_
page
_sta
te
arch
_get
_unm
appe
d_ar
ea_t
opdo
wntry
_cha
rge
com
mon
_file
_per
mco
mm
on_p
erm
__fg
et_l
ight
try_m
odul
e_ge
t
get_
task
_pol
icy.p
art.3
4
mem
_cgr
oup_
char
ge_s
tatis
tics
from
_kgi
d_m
unge
d__
lru_c
ache
_add
__co
mpu
te_r
unna
ble_c
ontri
bvm
acac
he_f
ind
fsno
tify
idle
_cpu
up_w
rite
finish
_tas
k_sw
itch
switc
h_m
m_i
rqs_
off
__pa
ge_s
et_a
non_
rmap
__de
c_zo
ne_s
tate
__vm
a_lin
k_lis
tm
ay_e
xpan
d_vm
rcu_
note
_con
text
_swi
tch
up_r
ead
rcu_
irq_e
xit
__m
emcg
_km
em_g
et_c
ache
cap_
capa
bleex
pand
_file
spi
ck_n
ext_
task
_dl
pick
_nex
t_ta
sk_r
t__
wake
_up_
bit
_con
d_re
sche
d_r
aw_s
pin_
lock
clear
_bud
dies
inc_
zone
_pag
e_st
ate
put_
prev
_tas
k_id
le
secu
rity_
inod
e_pe
rmiss
ion
unlo
ck_p
age_
mem
cg
_raw
_spi
n_lo
ck_i
rqsa
ve_r
aw_s
pin_
lock
_irq
lock
_pag
e_m
emcg
__co
mpu
te_r
unna
ble_c
ontri
b.pa
rt.59
pick
_nex
t_ta
sk_s
top
get_
nohz
_tim
er_t
arge
trc
u_irq
_ent
er
chec
k_cf
s_rq
_run
time
__in
c_zo
ne_s
tate
hrtic
k_up
date
Function Graph Overhead (Sorted by median)
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Ove
rhea
d %
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Future Work
• Apply hypercall technique to trace full kernel boot up
• Explore this solution with nested virtual machine
• Can we trace the Boot loader too ?
POLYTECHNIQUE MONTREAL – Abderrahmane Benbachir
Questions
[email protected] https://github.com/abenbachir
Thanks�to�Professor�Michel�Dagenais�and�our�partners�EfficiOS�and�Ericsson.�