(Static) Program Analysis 동아대학교 컴퓨터공학과 조장우. Motivation u 컴퓨터...

Post on 14-Jan-2016

222 views 2 download

Transcript of (Static) Program Analysis 동아대학교 컴퓨터공학과 조장우. Motivation u 컴퓨터...

(Static) Program Analysis

동아대학교 컴퓨터공학과조장우

Motivation

컴퓨터 기술자는 무엇을 해야 하는가 ?

• The production of reliable software, its maintenance, and safe evolution year after year (up to 20 to 30 years)

Hardware and Software

Hardware • Over the last 25 years, computer hardware has

seen its performance multiplied by 106;

Software• The size of programs executed by these

computers has grown up in similar proportions;

Example• MS Windows

– > 30 000 000 lines– > 30 000 known bugs

Software Development

Software development• Large scale computer programming is very difficult;• Reasoning on large programs is very difficult;• Errors are quite frequent

Idea• Use the computer to find programming errors• How can computers be programmed so as to

analyze the behavior of software before and without executing it?

• This is essential for safety-critical systems (eg. planes, trains, launchers, nuclear plants, …)

Program Analysis

프로그램의 실행 성질을 실행 전에 자동으로 안전하게 어림잡는 기술이다 [ 이광근 04].

• 실행 전에 : 프로그램을 실행시키지 않고• 자동으로 : 프로그램이 프로그램을 분석• 안전하게 : 모든 가능성을 포함하도록• 어림잡는 : 실제 이외의 것들이 포함됨

어림잡지 않으면 불가능

Program Analysis vs Testing

Testing• To prove the presence of bugs relative to a specification;• Some bugs may be missed;• Nothing can be concluded on correctness when no bug is

found

Program analysis • To prove the absence of bugs relative to a specification;• No bugs is ever missed;• Inconclusive situations may exist(undecidability)

-> bug or false alarm• Correctness follows when no bug is found;

Program Analysis Techniques

Data Flow Analysis Constraint based analysis Type based analysis

• Type and effect system• Type qualifier

Abstract interpretation Model checking …

Questions about behavior of programs

• Is the variable x initialized before it is read? • Will the value of x be read in the future? • What is a lower and upper bound on the value

of the integer variable x? • At which program points could x be assigned its

current value? • Can the pointer p be null? • Which variables can p point to? • Do p and q point to disjoint structures in the

heap? …

Why are the answers interesting?

Ensure correctness• Verify behavior• Catch bugs early

Increase efficiency• Compiler optimization• Resource usage

Rice’s theorem, 1953

Any non trivial property of the behavior of programs in a Turing-complete language is undecidable!

Can we decide if a variable has a constant value?

Approximation

Approximate answers may be decidable!

The approximation must be conservative: • either ”yes” or ”no” must always be the

correct answer • which direction depends on the client

application • the useful answer must always be correct

Example approximations

Decide if a function is ever called at runtime: • if ”no”, remove the function from the code • if ”yes”, don’t do anything • the ”no” answer must always be correct if given

Decide if a cast (A)x will always succeed: • if ”yes”, don’t generate a runtime check • if ”no”, generate code for the cast • the ”yes” answer must always be correct if given

The engineering challenge

A correct but trivial approximation algorithm may just give the useless answer every time

The engineering challenge is to give the useful answer often enough to fuel the client application

This is the hard (and fun) part of static analysis...

Engineering in practice (1/4)

Where do the pointers come from?

Engineering in practice (2/4)

The trivial answer: from somewhere!

Engineering in practice (3/4)

The hard answer: from a few places!

Engineering in practice (4/4)

Over the last 15 years: ≥ 500 publications, ≥ 50 PhD theses

Bug finding

int main() { char *p,*q; p = NULL; printf("%s",p); q = (char *)malloc(100); p = q; free(q); *p = 'x'; free(p); p = (char *)malloc(100); p = (char *)malloc(100); q = p; strcat(p,q); }

Industrialization

Fortify (2003, USA)• Merged into HP, 2010

Polyspace(1999, France)• Abstract interpretation based tool.

Sparrow(2007, Korea)

Buffer overrun, null dereference, use after free, double free, free non-heap variable, memory leaks, divide by zero, …

Q & A