1
ARM 64bit has come!
Tetsuyuki Kobayashi
2014.5.23 Japan Technical Jamboree2014.5.25 Updated for カーネル/VM探検隊
2
The latest version of this slide will be available from here
http://www.slideshare.net/tetsu.koba/presentations
3
Who am I?
20+ years involved in embedded systems 10 years in real time OS, such as iTRON 10 years in embedded Java Virtual Machine Now GCC, Linux, QEMU, Android, …
Blogs http://d.hatena.ne.jp/embedded/ (Personal) http://blog.kmckk.com/ (Corporate) http://kobablog.wordpress.com/(English)
Twitter @tetsu_koba
Today's topics
Introduction of ARM 64bit But does not cover all, only something interesting for me :)
Try aarch64 using QEMU
ARMv8 terminology
AArch64: 64 bit mode 1 instruction set: A64 A64: 32bit fixed length instructions
AArch32: 32 bit mode Upper compatible with ARMv7-A architecture 2 instruction sets: A32, T32 A32: ARM, 32bit fixed length instructions T32: Thumb2, 16bit/32bit instructions
6
ARM64 is not official name
In the kernel source arch/arm64
Exception level
4 levels Typical usage
EL0: User application EL1: Kernel of OS EL2: Hypervisor EL3: Secure monitor
Aarch64/aarch32 can change between exception level
CF. PL0-PL2 (Privilege level) at ARMv7
Aarch64 execution model
R0 – R30: 64bit length general purpose registers
Wn: lower 32bit Xn: 64bit 32th register means zero register(XZR, WZR) or SP
SP: Stack Pointer Must be 16 byte aligned WSP for lower 32bit
PC: Program Counter Can not use for calculate destination
Aarch64 execution model (cont.)
V0 – V31: 128 bit length registers For floating point and SIMD Aarch64 must have FPU. No calling standard for
soft-float. Scalar
Bn, Hn, Sn, Dn, Qn Vector
Vn.8B, Vn.16B, Vn.4H, Vn.8H, Vn.2S, Vn.4S, Vn.1D, Vn.2D
FPCR: Floating Point Control Register FPSR: Floating Point Status Register
Aarch64 addressing model
Without tag: 64bit virtual address With tag: 8bit tag + 56bit virtual address
Tag is ignored when load/store/branch Good for implementing type-less languages
Effective virtual address length is 48bit.
Calling standard (AAPCS64)
R30 = LR (Link Register) R29 = FP (Frame Pointer) Parameter passing
R0 – R7 for integer and pointer V0 – V7 for float
Callee must preserve R19 – R29, SP V8 – V15
No calling standard for soft-float
A64 instruction set
Brand-new, clean design for 64bit architecture Not all, very small set of ”conditional data
processing” instructions No equivalent of Thumb2's IT instruction.
No multiple load/store
No multiple load/store GP registers such as LDM/STM, PUSH/POP
Instead, there are 2 register load/store such as LDP/STP
YIELD instruction
NOP with hinting not important Use in spin-loop and trigger context
switching in SMT(Symmetric Multi-Threading)
Sample #1 source
#include <stdio.h>
int main(){
int i;
for (i = 5; i >=0; i--) {printf("count down: %d\n", i);
}return 0;
}
Sample #1 Thumb2
000083f8 <main>: 83f8: b570 push {r4, r5, r6, lr} 83fa: 2405 movs r4, #5 83fc: f248 456c movw r5, #33900 ; 0x846c 8400: f2c0 0500 movt r5, #0 8404: 2601 movs r6, #1 8406: 4630 mov r0, r6 8408: 4629 mov r1, r5 840a: 4622 mov r2, r4 840c: f7ff ef7a blx 8304 <_init+0x38> 8410: 3c01 subs r4, #1 8412: f1b4 3fff cmp.w r4, #4294967295 ; 0xffffffff 8416: d1f6 bne.n 8406 <main+0xe> 8418: 2000 movs r0, #0 841a: bd70 pop {r4, r5, r6, pc}
Sample #1 A64
0000000000400440 <main>: 400440: a9be7bfd stp x29, x30, [sp,#-32]! 400444: 910003fd mov x29, sp 400448: a90153f3 stp x19, x20, [sp,#16] 40044c: 90000014 adrp x20, 400000 <_init-0x3c0> 400450: 528000b3 mov w19, #0x5 // #5 400454: 911a0294 add x20, x20, #0x680 400458: 2a1303e2 mov w2, w19 40045c: 52800020 mov w0, #0x1 // #1 400460: aa1403e1 mov x1, x20 400464: 97ffffeb bl 400410 <__printf_chk@plt> 400468: 51000673 sub w19, w19, #0x1 40046c: 3100067f cmn w19, #0x1 400470: 54ffff41 b.ne 400458 <main+0x18> 400474: 52800000 mov w0, #0x0 // #0 400478: a94153f3 ldp x19, x20, [sp,#16] 40047c: a8c27bfd ldp x29, x30, [sp],#32 400480: d65f03c0 ret
Sample #2 source
int iaload(int *base, int index){
return base[index];}
long long laload(long long *base, int index){
return base[index];}
char ibload(char *base, int index){
return base[index];}
short isload(short *base, int index){
return base[index];}
Sample #2 Thumb2
00000000 <iaload>: 0: f850 0021 ldr.w r0, [r0, r1, lsl #2] 4: 4770 bx lr 6: bf00 nop
00000008 <laload>: 8: eb00 01c1 add.w r1, r0, r1, lsl #3 c: e9d1 0100 ldrd r0, r1, [r1] 10: 4770 bx lr 12: bf00 nop
00000014 <ibload>: 14: 5c40 ldrb r0, [r0, r1] 16: 4770 bx lr
00000018 <isload>: 18: f930 0011 ldrsh.w r0, [r0, r1, lsl #1] 1c: 4770 bx lr 1e: bf00 nop
Sample #2 A64
0000000000000000 <iaload>: 0: b861d800 ldr w0, [x0,w1,sxtw #2] 4: d65f03c0 ret
0000000000000008 <laload>: 8: f861d800 ldr x0, [x0,w1,sxtw #3] c: d65f03c0 ret
0000000000000010 <ibload>: 10: 3861c800 ldrb w0, [x0,w1,sxtw] 14: d65f03c0 ret
0000000000000018 <isload>: 18: 7861d800 ldrh w0, [x0,w1,sxtw #1] 1c: d65f03c0 ret
Sample #3 source
double range(double x, double min, double max){
if (x < min) return min;
else if (x > max)return max;
else return x;
}
Sample #3 Thumb2
00000000 <range>: 0: eeb4 0bc1 vcmpe.f64 d0, d1 4: eef1 fa10 vmrs APSR_nzcv, fpscr 8: d407 bmi.n 1a <range+0x1a> a: eeb4 0bc2 vcmpe.f64 d0, d2 e: eef1 fa10 vmrs APSR_nzcv, fpscr 12: bfc8 it gt 14: eeb0 0b42 vmovgt.f64 d0, d2 18: 4770 bx lr 1a: eeb0 0b41 vmov.f64d0, d1 1e: 4770 bx lr
Sample #3 A64
0000000000000000 <range>: 0: 1e612010 fcmpe d0, d1 4: 540000a4 b.mi 18 <range+0x18> 8: 1e622010 fcmpe d0, d2 c: 1e604041 fmov d1, d2 10: 5400004c b.gt 18 <range+0x18> 14: 1e604001 fmov d1, d0 18: 1e604020 fmov d0, d1 1c: d65f03c0 ret
Cache control
Application level cache instructions Data cache
DC VAU DC CVAC DC CIVAC
Instruction cache IC IVAU
No need to call kernel syscall JIT friendly
Preloading cache
PRFM <prfop>, addr|label <prfop> ::= <type><target><policy> <type> ::= PLD | PST | PLI <target> ::= L1 | L2 | L3 <policy> ::= KEEP | STRM
Non-temporal load/store
LDNP/STNP Hinting unlikely to be accessed again
(like streaming)
Aarch32
Upper compatible with ARMv7 Added encrypt extension Added other some new instructions
aligned to aarch64 Removed Jazelle, ThumbEE
Let's try Aarch64 using QEMU
Qemu 2.0 supports aarch64 user mode emulation
Ubuntu 14.04 has qemu 2.0 and cross compiler for aarch64
$ sudo apt-get install qemu-user-static$ sudo apt-get install g++-aarch64-linux-gnu
Prepare gdb for aarch64
$ sudo apt-get build-dep gdb $ wget http://ftp.gnu.org/gnu/gdb/gdb-7.7.1.tar.bz2 $ tar xf gdb-7.7.1.tar.bz2 $ mkdir obj $ cd obj $ ../gdb-7.7.1/configure --target=aarch64-linux-gnu $ make $ sudo make install
Execute by qemu and connect gdb
$ aarch64-linux-gnu-gcc -g a.c$ export QEMU_LD_PREFIX=/usr/aarch64-linux-gnu/$ qemu-aarch64-static -g 1234 ./a.out
$ aarch64-linux-gnu-gdb ./a.out ...(gdb) target remote :1234(gdb) b main(gdb) c(gdb) x/i $pc=> 0x4005a0 <main>: stp x29, x30, [sp,#-48]!(gdb)
DEMO
32
References
ARMv8Technology Preview ARMv8 Instruction Set Overview ARM®Architecture Reference Manual Procedure Call Standard for theARM 64-bitArch
itecture(AArch64) ARM 64bit ARMv8 の アーキテクチャ の概要
Ubuntu 14.04 arm 64bit(aarch6で4)のコードをコンパイルして動かしてみる
33
Any comment?
@tetsu_koba
Thank you for listening!
Top Related