Performance monitoring and analysis using perf and BPF

35
www.huawei.com HUAWEI TECHNOLOGIES CO., LTD. Performance Monitoring and Analysis Using perf+BPF Wang Nan (王楠) / [email protected] 2016/08/17

Transcript of Performance monitoring and analysis using perf and BPF

Page 1: Performance monitoring and analysis using perf and BPF

www.huawei.com

HUAWEI TECHNOLOGIES CO., LTD.

Performance Monitoring and Analysis

Using perf+BPF

Wang Nan (王楠) / [email protected]

2016/08/17

Page 2: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 2

kTap overview

Announced in LinuxCon Japan 2013

Inspired by DTrace and SystemTap

In-kernel Lua virtual machine

Lua Frontend

Source: http://events.linuxfoundation.org/sites/events/files/lcjpcojp13_zhangwei.pdf

Page 3: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 3

kTap overview

Almost gets into 3.13

But rejected by Ingo

Main reason: Kernel already has BPF virtual machine

Source: http://lwn.net/Articles/572788/

Source: http://lwn.net/Articles/572793/

Page 4: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 4

kTap overview

We restarted in 2014

kTap perf + BPF

VM Lua VM BPF

Language Lua C

Frontend Stand alone perf

Upstream Not yet Almost done

Page 5: Performance monitoring and analysis using perf and BPF

Motivation

Highlight features

perf+BPF

Backward ring buffer

The next big thing

Summary

Contents

Page 6: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 6

Motivation: Tracing

What

happen on

this frame?

Android Draw Profiler Source: https://developer.android.com/tools/performance/profile-gpu-rendering/index.html

TraceCompass result. WoW! RenderThread tooks most of cycles! * systrace can do similar thing but we don’t want to be bounded to Android

playbackStart playbackEnd

Page 7: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 7

Motivation: Tracing

Basic workflow

# perf record –e sched:sched_switch --exclude-perf –e raw_syscalls:* --exclude-perf –a sleep 10

Tracing:

Converting (from perf.data to CTF): # perf data convert --to-ctf ./out.ctf

Converting (from CTF to R table, thanks to libbabeltrace): # python3 convert.py ./out.ctf > ./out.data

# R

...

> read.table(“./out.data”, header=TRUE, ...)

perf.data CTF R table Format:

Tools: perf report perf script flamegraph

babeltrace TraceCompass Ad-hoc python script

Ad-hoc R script

Page 8: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 8

Motivation: Tracing

Requirements for tracer

Fine grained (syscall, sched:switch, samples with call stack)

Affordable data volume

conflict

Page 9: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 9

Motivation: Tracing

Requirements for tracer

Fine grained (syscall, sched:switch, samples with call stack)

Affordable data volume

conflict

Filtering: drop unneeded records Aggregating: profiling during recording (e.g. building histogram)

Use BPF

Page 10: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 10

BPF (Berkeley Packet Filter)

Event

Perf event

BPF VM

Drop

perf.data

Rules: if pid == 1234 …

BPF Byte code: …

cmp r0, 1234

Tracepoints kprobes uprobes

return 0

BPF for Event filtering

Page 11: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 11

BPF (Berkeley Packet Filter)

Features

In-kernel just-in-time compiling

X86_64 and ARM64

In-kernel verifier

Memory access

Restricted API

Turing incomplete: ensure to halt

BPF maps

Workflow

myscript.c Byte code loaded llvm bpf(2) ioctl attached

Page 12: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 12

Motivation

Highlight features

perf+BPF

Backward ring buffer

The next big thing

Summary

Contents

Page 13: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 13

Improved BPF support

Perf support BPF scripts

perf record –e myscript.c –a

#define SEC(NAME) __attribute__((section(NAME), used))

/* filter.c */

SEC(“func=sys_write”)

int func(void *ctx) {

u64 pid = get_current_pid_tgid() & 0xffffffff;

char fmt[] = “pid %d calls sys_write”;

if (pid == 1234)

return 1;

else

trace_printk(fmt, sizeof(fmt), pid)

return 0

}

# perf record –e ./filter.c –a sleep 10

# perf script

rtkit-daemon 1234 [002] 122474.586545: ...

rtkit-daemon 1234 [004] 122475.345923: ...

...

Loading .c requires external clang: # cat ~/.gitconfig

[llvm]

clang-path=/path/to/clang

Pre-compile to .o for smart phone: # clang –c myscript.c –target bpf –O2

...

# perf record –e myscript.o ...

Page 14: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 14

Improved BPF support

What can we do in BPF scripts?

Fetch arguments

Read from PMU through perf event

Output to perf.data

Page 15: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 15

Improved BPF support

Fetch arguments

SEC(“func=sys_write fd”)

int func(void *ctx, int err, int fd) {

...

}

SEC(“func=null_lseek file->f_mode offset orig”)

int bpf_func__null_lseek(void *ctx, int err,

unsigned long f_mode,

unsigned long offset,

unsigned long orig){

...

}

Fetching function parameters

Fetching struct member

Prologue generation void *file = ctx->di;

int offset = ctx->si;

int orig = ctx->dx;

int f_mode;

bpf_probe_read(&f_mode, 8, file + offsetof(struct file, f_mode))

Page 16: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 16

Improved BPF support

Read from PMU through perf event

struct bpf_map_def SEC("maps") pmu_map = {

.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,

.key_size = sizeof(int),

.value_size = sizeof(int),

.max_entries = __NR_CPUS__,

};

...

val = perf_event_read(&pmu_map, get_smp_processor_id());

...

# perf record –a -i -e cycles/period=0x7fffffffffffffff,name=cyc/ \

-e './test_bpf_map_2.c/map:pmu_map.event=cyc/’

Page 17: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 17

Improved BPF support

Output to perf.data

struct bpf_map_def SEC("maps") __bpf_stdout__ = {

.type = BPF_MAP_TYPE_PERF_EVENT_ARRAY,

.key_size = sizeof(int),

.value_size = sizeof(u32),

.max_entries = __NR_CPUS__,

};

...

char output_str[] = "Raise a BPF event!";

err = perf_event_output(ctx, &__bpf_stdout__,

get_smp_processor_id(),

&output_str, sizeof(output_str));

...

# perf trace -e nanosleep --ev test_bpf_stdout.c usleep 1

0.007 ( 0.007 ms): usleep/729 nanosleep(rqtp: 0x7ffc5bbc5fe0) ...

0.007 ( ): __bpf_stdout__:Raise a BPF event!..)

0.008 ( ): perf_bpf_probe:func_begin:(ffffffff81112460))

0.069 ( ): __bpf_stdout__:Raise a BPF event!..)

0.070 ( ): perf_bpf_probe:func_end:(ffffffff81112460 <- ffffffff81003d92))

0.072 ( 0.072 ms): usleep/729 ... [continued]: nanosleep()) = 0

Page 18: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 18

Example: Per-function IPC

Question: what’s the IPC of function X?

SEC(“exec=/path/to/exec;”

"f_entry=functionX")

int f_entry(void *ctx) {

...

cycles = bpf_perf_event_read(&cycles_pmu, ...);

instructions = bpf_perf_event_read(&instructions_pmu, ...);

bpf_map_update_elem(... &cycles, ...);

bpf_map_update_elem(... &instructions, ...);

...

return 0;

}

SEC(“exec=/path/to/exec;”

"f_return=functionX%return")

int f_return(void *ctx) {

...

cycles = bpf_perf_event_read(&cycles_pmu, ...);

instructions = bpf_perf_event_read(&instructions_pmu, ...);

...

bpf_output_trace_data(&output, sizeof(output));

...

return 0;

}

uprobe at function entry: Read instruction and cycles

uprobe at function exit: Read again, compute, output

Page 19: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 19

Tracing rare outliners

How to trace the outliner?

Continuous perf record

Offline searching

Android Draw Profiler Source: https://developer.android.com/tools/performance/profile-gpu-rendering/index.html

What

happen on

this frame?

Page 20: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 20

Tracing rare outliners

How to trace the outliner?

Dump ring buffer to perf.data when

outliner Is observed!

Problem

Reading from perf ring buffer

Android Draw Profiler Source: https://developer.android.com/tools/performance/profile-gpu-rendering/index.html

What

happen on

this frame?

Page 21: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 21

Tracing rare outliners

perf ring buffer

Event 1 Event 2 Event 3

Write Direction

Write pointer Read pointer

Page 22: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 22

Tracing rare outliners

perf ring buffer

Event 1 Event 2 Event 3

Write Direction

Write pointer Read pointer

type, size … Payload

Page 23: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 23

Tracing rare outliners

perf ring buffer: overwrite

Where to start reading?

Event 2 Event 3

Write Direction

Write pointer

Eve… …nt 1 …nt4

Page 24: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 24

Tracing rare outliners

Backward perf ring buffer

Event 2

Write Direction

Event 3

Write pointer Read pointer

Event 1

Read Direction

Page 25: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 25

Tracing rare outliners

Backward perf ring buffer

Event 4,3,2 can be retrived

Event 2

Write Direction

Event 3 …nt4 Eve… Eve…

Write pointer Read pointer

Read Direction

Page 26: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 26

Tracing rare outliners

perf support backward ring buffer:

perf record --overwrite , --switch-output, --tail-synthesize

perf runs in background

Silent: no CPU and IO cost

Capture fine grained trace close to outliners

# perf record -m 4 -e raw_syscalls:* -g --overwrite --switch-output --tail-synthesize \

dd if=/dev/zero of=/dev/null &

[1] 30498

# kill –s SIGUSR2 30498

[ perf record: dump data: Woken up 1 times ]

[ perf record: Dump perf.data.2016081303042824 ]

# kill –s SIGUSR2 30498

[ perf record: dump data: Woken up 1 times ]

[ perf record: Dump perf.data.2016081303054379 ]

# ls –l ./perf.data.*

-rw------- 1 root root 38637 Aug 13 03:04 ./perf.data.2016081303042824

-rw------- 1 root root 38581 Aug 13 03:05 ./perf.data.2016081303054379

Page 27: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 27

Tracing rare outliners

Tracing based profiling

Android Draw Profiler Source: https://developer.android.com/tools/performance/profile-gpu-rendering/index.html

What

happen on

this frame?

How to find the outliner?

Dump data after the outliner is observed

perf record --overwrite --switch-output

Send SIGUSR2 to perf

Page 28: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 28

Tracing rare outliners

Tracing based profiling

Android Draw Profiler Source: https://developer.android.com/tools/performance/profile-gpu-rendering/index.html

What

happen on

this frame?

How to find the outliner?

Dump data after the outliner is observed

perf record --overwrite

Send SIGUSR2 to perf

Good enough?

How to detect the outliner?

External profiler

How to trigger data dumping?

Script / hand

Page 29: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 29

Tracing rare outliners

Tracing based profiling

Android Draw Profiler Source: https://developer.android.com/tools/performance/profile-gpu-rendering/index.html

What

happen on

this frame?

How to find the outliner?

Dump data after the outliner is observed

perf record --overwrite

Send SIGUSR2 to perf

Good enough?

How to detect the outliner?

BPF script

How to trigger data dumping?

BPF script

Page 30: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 30

Motivation

Highlight features

perf+BPF

Backward ring buffer

The next big thing

Summary

Contents

Page 31: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 31

The next big thing

perf

script.c

clang

perf

script.c

clang

LLVM

script.o

BPF loader

obj

perf

script.c

clang

LLVM

BPF

loader

obj JIT

perf

hooks

perf

script.x

One-liner

clang

LLVM

BPF

loader

obj JIT

Perf

hooks

Other

frontends

Eliminate external clang dependency Good for Android

Tracing, actions and data processing in one script Not only for profiling: dynamic tuning

Better frontend Support one-liners Easy to use

Next big thing: Integrating clang and LLVM into perf

Page 32: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 32

Motivation

Highlight features

perf+BPF

Backward ring buffer

The next big thing

Summary

Contents

Page 33: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 33

Summary

Perf and BPF integration

Compile BPF using external clang

Load and attach to kernel event

Many BPF features

Backward ring buffer

Perf run background

Dump trace when SIGUSR2 is received

Combine then together

Improve perf to better support tracing based profiling

Capture rare outliner

All of the above are ready in Linux v4.8-rc1

Page 34: Performance monitoring and analysis using perf and BPF

HUAWEI TECHNOLOGIES CO., LTD. Huawei proprietary. No spread without permission. Page 34

Questions?

Page 35: Performance monitoring and analysis using perf and BPF

Thank you www.huawei.com

Copyright© 2011 Huawei Technologies Co., Ltd. All Rights Reserved.

The information in this document may contain predictive statements including, without limitation, statements regarding the future financial

and operating results, future product portfolio, new technology, etc. There are a number of factors that could cause actual results and

developments to differ materially from those expressed or implied in the predictive statements. Therefore, such information is provided for

reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the information at any time without notice.