Intro. to static analysis

55
Intro. To Sta+c Analysis C.K. Chen 2014.09.27

description

Basic Static Analysis to Understand Program

Transcript of Intro. to static analysis

Page 1: Intro. to static analysis

Intro.  To  Sta+c  Analysis

C.K.  Chen  2014.09.27

Page 2: Intro. to static analysis

Who  am  I •  C.K  Chen  (陳仲寬)  –  P.H.D  Student  in  DSNS  Lab,  NCTU  –  BambooFox  CTF  –  Research  in  

•  Reverse  Engineering  •  Malware  Analysis  •  Virtual  Machine  

•  About  this  work  –  CHI-­‐WEI  WANG,  CHIA-­‐WEI  WANG,  CHONG-­‐KUAN  CHEN    

2

Page 3: Intro. to static analysis

About  DSNS

•  謝續平教授  •  實驗室研究方向  – 惡意程式分析  – 虛擬機器  – 數位鑑識  – 網路安全  

3

Page 4: Intro. to static analysis

Outline

•  Intro.  To  Sta+c  Analysis  •  Common  Tools  inLinux  for  Sta+c  Analysis  •  Disassemble  •  Reverse  Assambly  to  C    – Fundamental  ASM  

•  IDA  Pro    – Prace+ce  

•  Tips  for  Sta+c  Analtsis  

Page 5: Intro. to static analysis

Intr.  to  Sta+c  Analysis

•  Sta+c  analysis  – Analysis  malware  without  execu+on  

•  Dynamic  analysis  – Execute  malware  inside  controllable  environment  and  monitor  it’s  behavior

Page 6: Intro. to static analysis

Informa+on  from  Sta+c  Analysis

•  What  informa+on  we  can  get  from  sta+c  analysis

Page 7: Intro. to static analysis

Informa+on  from  Sta+c  Analysis

•  What  informa+on  we  can  get  from  sta+c  analysis  – File  Structure  – Binary  Code  – Related  Module  – Suspicious  String

Page 8: Intro. to static analysis

Informa+on  from  Sta+c  Analysis

•  What  we  cannot  get?  – Register  Value  – Memory  Value  – Packed  Code  – Encrypted  Message

Page 9: Intro. to static analysis

Usage  of  sta+c  analysis

•  In  normal  case,  there  are  some  problems  that  sta+c  analysis  is  involved  – Reverse:  Windows,  Linux  – Pwned(Exploit):  Linux,  Windows(rare)  

•  Complement  to  the  dynamic  analysis

Page 10: Intro. to static analysis

First  step  to  Sta+c  Analysis

•  There  are  some  Linux  commands  that  can  give  useful  informa+on  of  file  – Strings  – Objdump  – Hexdump  – File  

Page 11: Intro. to static analysis

Linux  Command

•  Strings  –  For  each  file  given,  GNU  strings  prints  the  printable  character  sequences  that  are  at  least  4  characters  long  and  are  followed  by  an  unprintable  character.    

–  Get  clues  of  file

Page 12: Intro. to static analysis

Linux  Command

•  File  – File  tests  each  argument  in  an  a`empt  to  classify  it.  There  are  three  sets  of  tests,  performed  in  this  order:  filesystem  tests,  magic  number  tests,  and  language  tests.  The  first  test  that  succeeds  causes  the  file  type  to  be  printed.    

Page 13: Intro. to static analysis

Linux  Command

•  Hexdump  – The  hexdump  u+lity  is  a  filter  which  displays  the  specified  files,  or  the  standard  input,  if  no  files  are  specified,  in  a  user  specified  format.  

– Hex,  Oct,  Char,  …..

Page 14: Intro. to static analysis

Linux  Command

•  ldd  -­‐  print  shared  library  dependencies  – Loading  library  – Loca+on  of  library  file  – Loading  address  of  library    

Page 15: Intro. to static analysis

Linux  Command

•  Objdump  – Dump  informa+on  of  ELF  file  – Rich  informa+on  can  be  dumped  – Can  used  to  build  simplest  malware  analysis  system

Page 16: Intro. to static analysis

Objdump

Page 17: Intro. to static analysis

Disassemble

•  objdump  -­‐D

Page 18: Intro. to static analysis

Global  Offset  Table

•  objdump  –R  – Key  of  sharing  library  in  linux  – GOT  Hijack

Page 19: Intro. to static analysis

Disassemble

•  Disassemble  is  a  procedure  to  convert  binary  machine  code  into  assembly  code

Page 20: Intro. to static analysis

Code  Discovery  Problem

•  In  the  binary  file,  instruc+ons  and  data  may  hybrid  in  the  sec+on.  –  It  is  not  easy  to  discover  instruc+ons  in  the  binary  – Especial  for  variable-­‐length  instruc+on  set  like  x86

Page 21: Intro. to static analysis

Linear  sweep

•  Starts  usually  to  disassemble  from  the  first  byte  of  the  code  sec+on  in  a  linear  fashion  

•  Disassembles  one  instruc+on  afer  another  un+l  the  end  of  the  sec+on  is  reached  

•  Do  not  understand  program  flow  •  objdump  

Page 22: Intro. to static analysis

Recursive  traversal •  instruc+on  classified  as  

–  Sequen+al  flow:  pass  execu+on  to  the  next  instruc+on  that  immediately  follows  –  Condi+onal  branching:  if  the  condi+on  is  true  the  branch  is  taken  and  the  instruc+on  

pointer  must  change  to  reflect  the  target  of  the  branch,  otherwise  it  con+nues  in  a  linear  fashion  (jnz,  jne,  .  .  .  ).  In  sta+c  context  this  algorithm  disassemble  both  paths  

–  Uncondi+onal  branching:  the  branch  is  taken  without  any  condi+on;  the  algorithm  follows  the  (execu+on)  flow  (jmp)  

–  Func+on  call:  are  like  uncondi+onal  jumps  but  they  return  to  the  instruc+on  immediately  following  the  call  

–  Return:  every  instruc+ons  which  may  modify  the  flow  of  the  program  add  the  target  address  to  a  list  of  deferred  disassembly.  When  a  return  instruc+on  is  reached  an  address  is  popped  from  the  list  and  the  algorithm  con+nues  from  there  (recursive  algorithm).  

•  Some  issue  –  Indirect  code  invoca+ons  –  Does  returning  from  a  call  always  allow  for  a  faithful  disassembly

Page 23: Intro. to static analysis

Problem  of  Disassembly

•  Remember  that  disassembler  may  not  always  true  

•  Linear  sweep  

•  Recursive  traversal

       jmp  .des+na+on          db  0x6a  ;  garbage  byte  technique    .des+na+on:            pop  eax

eb  01    jmp  0x401003    6a  58    push  0x58  

push  DWORD  .des+na+on  jmp  DWORD  [esp]  db  0x6a  ;  garbage  byte  technique  .des+na+on:  pop  eax  

push  DWORD  .des+na+on  jmp  DWORD  [esp]  push  0x58  

Page 24: Intro. to static analysis

Reverse  Assambly  to  C

•  Registers  Architecture  

•  The  EIP  register  contains  the  address  of  the  next  instruc+on  to  be  executed  if  no  branching  is  done.

Page 25: Intro. to static analysis

Memory  Layout •  Stack  

–  Not  maintain  in  Executable  –  Local  Variable  

•  Heap  –  Not  maintain  in  Executable  –  Dynamic  Allocate  Memory  

•  BSS  Sec+on  –  Unini+alized  Data  –  Global  variables  and  sta+c  variables    

that  are  ini+alized  to  zero  or  do  not    have  explicit  ini+aliza+on  in  source  code  

•  Data  Sec+on  –  Ini+alized  Data  –  Global  variables  and  sta+c    

variables  

Page 26: Intro. to static analysis

Variables

•   Disassembled  code  for  local  and  global  variables

Page 27: Intro. to static analysis

Local  Variables/Arguments

•  Caller  push  argument  into  stack  •  Caller  push  eip  by  call    instruc+on  

•  Callee  save/push  the  caller’s  ebp  

•  Callee  reserve  space  for  local  variables  – sub

Stack  Growing  Direc+on

Page 28: Intro. to static analysis

Data  Movement •  MOV  dst,  src  –  Src  <=  dst  

•  LEA  dst,  src  –  Load  effec+ve  address  of  operand  into  specified  register  

–  To  calculate  the  address  of  a  variable  which  doesn't  have  a  fixed  address  

•  Example  – mov  eax,  [ebp  -­‐  4]  <=  get  content  in  [ebp  -­‐  4]  – mov  eax,  ebp  –  4  <=  wrong,  no  such  instruc+on  –  lea  eax,  [ebp  -­‐  4]  <=  get  address  of  [ebp  -­‐  4]

Page 29: Intro. to static analysis

Arithme+c  Operator

•  add  dest,  src  •  sub  dest,  src  •  mul  arg  •  div  – DIV  r/m8    – DIV  r/m16    – DIV  r/m32  

•  inc  •  dec

Page 30: Intro. to static analysis

Control  Instruc+ons •  Flag,  each  instruc+on  updates  some  field  of  flag  for  future  

branch  •  test  

–  Performs  a  bit-­‐wise  logical  AND  –  sets  the  ZF(zero),  SF(sign)  and  PF(parity)  flags  

•  cmp  –  Performs  a  comparison  opera+on  between  arg1  and  arg2  –  Set  SF,    ZF,  PF,  CF,  OF  and  AF  

Page 31: Intro. to static analysis

Branch  Instruc+on

•  JE  Jump  if  Equal  ZF=1    •  JNE  Jump  if  Not  Equal  ZF=0    •  JG  Jump  if  Greater  (ZF=0)  AND  (SF=OF)    •  JGE  Jump  if  Greater  or  Equal  SF=OF    •  JL  Jump  if  Less  SF≠OF    •  JLE  Jump  if  Less  or  Equal  (ZF=1)  OR  (SF≠OF)

Page 32: Intro. to static analysis

Stack  Opera+on

•  Stack  is  the  LIFO  data  structure  – PUSH:  put  data  into  top  of  stack  – POP:  get  data  from  top  of  stack

Page 33: Intro. to static analysis

Func+on  Call

•  Call  – Similar  to  jmp,  but  a  CALL  stores  the  current  EIP  on  the  stack  

•  RET    – Load  the  address  in  esp,  and  jump  to  that  address  

•  RET  num  –  Increase  esp  by  num  – Load  the  address  in  esp,  and  jump  to  that  address  

Page 34: Intro. to static analysis

Func+on  Pro

•  Func+on  Prologue  – Store  current  EBP  – Save  ESP  to    current  EBP  

– Leave  space  for    local  variables    

•  Func+on  Epilogue  – Set  ESP  to  EBP  – Restore  EBP  

Page 35: Intro. to static analysis

Calling  Conven+on •  The  transi+on  of  func+on  arguments  must  be  maintain  by  assembly  programmer,  

but  most  case  maintain  by  compiler  •  Stdcall    

–  func+on  arguments  are  passed  from  right  to  lef    –  the  calleé  is  in  charge  of  cleaning  up  the  stack.    –  Return  values  are  stored  in  EAX.    

•  cdecl    –  The  cdecl  (short  for  c  declara+on)  is  a  calling  conven+on  that  originates  from  the  C  

programming  language  and  is  used  by  many  C  compilers  for  the  x86  architecture.  –  The  main  difference  of  cdecl  and  stdcall  is  that  in  a  cdecl,  the  caller,  not  the  calleé,  is  

responsible  for  cleaning  up  the  stack.  •   pascal    

–  The  pascal  calling  conven+on  origins  from  the  Pascal  programming  language  –  The  main  difference  between  it  and  stdcall  is  that  the  parameters  are  pushed  to  the  stack  

from  lef  to  right.    •  fastcall    

–  The  fastcall  is  a  non-­‐standardized  calling  conven+on.  –  the  fastcall  conven+on  tends  to  load  them  into  registers.  This  results  in  less  memory  

interac+on  and  increases  the  performance  of  a  call.

Page 36: Intro. to static analysis

Func+on  Call  Structure

•   Func+on  Call  Structure  

•   

Page 37: Intro. to static analysis

Branch  Structure

•  Branch  Structure

Page 38: Intro. to static analysis

Do-­‐For  loop  

•  Do-­‐For  loop  

Page 39: Intro. to static analysis

IDA  Pro

•  IDA  Pro  is  the  most  well-­‐known  dissemble/decompile  tool  for  reversing  – Disassemble  – Friendly  GUI  – Decopiler  – Debugger

Page 40: Intro. to static analysis

Overview

Assembly  and  Control  Flow  View

Message  View

Control  Flow  View

Page 41: Intro. to static analysis

Func+onality(1)

Convert  Current  Loca+on  •  DATA  •  Instruc+on  •  String  •  Self-­‐defined  Data  

Structure  •  Array

Convert  Oprand  •  Offset  •  Hex/Oct/Dec/Bin  •  Constant  Char  •  Segment-­‐based  

Var  •  Stack-­‐based  Var  •  ….

Fun  Call  Window

Xref  Table

Graph  

Once  The  disassemble  make  mistake,  you  can  fix  it  yourself

Page 42: Intro. to static analysis

Func+onality(2)

Export  Func+on  •  List  func+ons  

export  to  other  Binary  

•  DLL,  entry  point

Import  Func+on  •  Func+ons  

included  from  other  files  

•  Import  func+on  can  help  you  to  guess  the  behavior  of  program

Names  •  Func+on  

Name  •  Variable  

Names  •  Strings  •  For  problem  

with  debugger  informa+on  inside,  names  can  be  useful

Strings  •  All  strings  use  •  For  some  easy  

problem,  this  can  help  you  to  get  flag  

•  For  other  problem,  it  s+ll  give  you  quick  look  to  program

Page 43: Intro. to static analysis

Useful  Hotkeys

Func,on Hotkey

1 Strings   Shif+F12  

2 Jump  to  operand   Enter

3 Jump  to  previous  posi+on   ESC

4 Jump  to  next  posi+on   Ctrl+Enter  

5 Jump  to  address   G

6 Jump  to  entry  point   Ctrl+E

7 Sequence  of  bytes   Alt+B

•  List  of  useful  hotkeys

Page 44: Intro. to static analysis

Prac+ce

•  Reverse  encryp+on  algo  in  bot.exe  – sub_418f50  – h`p://140.113.216.151/bot.exe

Page 45: Intro. to static analysis

Decompiler

•  Decompiler  can  help  you  to  transfer  assembly  into  C  code  – More  easy  to  read  

Page 46: Intro. to static analysis

But

•  Decompiler  result  is  not  perfect  – Most  of  +me  is  buggy  – Lack  of  source  code  level  informa+on  

•  May  not  support  All  playorm  – Arm  – X86  – X64  – …..

Page 47: Intro. to static analysis

Reversing  Concept

•  Iden+fy  important  part  of  program  •  Backward  tracking  user  data  •  Forward  tracking  interes+ng  API  func+on  •  Convert  back  to  C  code

Page 48: Intro. to static analysis

Iden+fy  important  part  of  program  

•  Iden+fy  what  you  interes+ng  – Strings:  ‘flag’,  ‘key’,  ….  – Func+on  to  read  input:  scanf(),  gets(),…  – Func+on  for  network  communica+on:  recv(),  send()  

– Read/Write  file    – …..  

Page 49: Intro. to static analysis

Backward  tracking  user  data  

•  Most  program  vulns  must  be  trigger  by  user  input  – You  can  not(or  difficult)  a`ack  a  func+on  independent  to  your  input  

•  Keep  track  about  variables  affected  by  your  input  – Data  Propagate  

•  Data  Dependency

Page 50: Intro. to static analysis

Forward  tracking  interes+ng  API  func+on

•  Most  vulns  are  cause  by  some  certain  func+ons  –  strcpy()  – memcpy()  –  scanf()  –  priny()  –  strcat()  – …..  

•  Try  to  trigger  these  func+ons  •  Analysis  control  flow  and  make  strategy  to  enforce  program  goto  these  func+ons

Page 51: Intro. to static analysis

Convert  back  to  C  code

1.  Gather  informa+on  –  IAT  –  Strings  – Dynamic  analysis  

2.  Iden+fy  func+on  of  interest  3.  Iden+fy  CALLs  4.  Iden+fy  algorithms  and  data  structures  5.  Pseudo-­‐code  it!  6.  Rename  func+on(s),  argument(s),  variable(s)  

Page 52: Intro. to static analysis

Problem  of  sta+c  analysis

•  Encryp+on/Self  Modified  Code  •  Lack  of  run+me  informa+on  •  Take  a  lot  of  +me  to  understand  program  L  

Page 53: Intro. to static analysis

Advantage

•  Why  we  s+ll  needed  sta+c  analysis?  – Give  you  very  first  concept  of  program  – Overview  of  program  flow  – Hybrid  with  dynamic  analysis    

Page 54: Intro. to static analysis

Summary

•  This  course  brings  the  basic  idea  of  sta+c  analysis  

•  Intro.  some  tool  for  sta+c  analysis  •  Basic  ASM  •  How  to  reverse  asm  to  c  – Func+on  call  – Memory  

•  Some  +ps  for  sta+c  analysis

Page 55: Intro. to static analysis

Q&A