(Static) Program Analysis
동아대학교 컴퓨터공학과조장우
Motivation
컴퓨터 기술자는 무엇을 해야 하는가 ?
• The production of reliable software, its maintenance, and safe evolution year after year (up to 20 to 30 years)
Hardware and Software
Hardware • Over the last 25 years, computer hardware has
seen its performance multiplied by 106;
Software• The size of programs executed by these
computers has grown up in similar proportions;
Example• MS Windows
– > 30 000 000 lines– > 30 000 known bugs
Software Development
Software development• Large scale computer programming is very difficult;• Reasoning on large programs is very difficult;• Errors are quite frequent
Idea• Use the computer to find programming errors• How can computers be programmed so as to
analyze the behavior of software before and without executing it?
• This is essential for safety-critical systems (eg. planes, trains, launchers, nuclear plants, …)
Program Analysis
프로그램의 실행 성질을 실행 전에 자동으로 안전하게 어림잡는 기술이다 [ 이광근 04].
• 실행 전에 : 프로그램을 실행시키지 않고• 자동으로 : 프로그램이 프로그램을 분석• 안전하게 : 모든 가능성을 포함하도록• 어림잡는 : 실제 이외의 것들이 포함됨
어림잡지 않으면 불가능
Program Analysis vs Testing
Testing• To prove the presence of bugs relative to a specification;• Some bugs may be missed;• Nothing can be concluded on correctness when no bug is
found
Program analysis • To prove the absence of bugs relative to a specification;• No bugs is ever missed;• Inconclusive situations may exist(undecidability)
-> bug or false alarm• Correctness follows when no bug is found;
Program Analysis Techniques
Data Flow Analysis Constraint based analysis Type based analysis
• Type and effect system• Type qualifier
Abstract interpretation Model checking …
Questions about behavior of programs
• Is the variable x initialized before it is read? • Will the value of x be read in the future? • What is a lower and upper bound on the value
of the integer variable x? • At which program points could x be assigned its
current value? • Can the pointer p be null? • Which variables can p point to? • Do p and q point to disjoint structures in the
heap? …
Why are the answers interesting?
Ensure correctness• Verify behavior• Catch bugs early
Increase efficiency• Compiler optimization• Resource usage
Rice’s theorem, 1953
Any non trivial property of the behavior of programs in a Turing-complete language is undecidable!
Can we decide if a variable has a constant value?
Approximation
Approximate answers may be decidable!
The approximation must be conservative: • either ”yes” or ”no” must always be the
correct answer • which direction depends on the client
application • the useful answer must always be correct
Example approximations
Decide if a function is ever called at runtime: • if ”no”, remove the function from the code • if ”yes”, don’t do anything • the ”no” answer must always be correct if given
Decide if a cast (A)x will always succeed: • if ”yes”, don’t generate a runtime check • if ”no”, generate code for the cast • the ”yes” answer must always be correct if given
The engineering challenge
A correct but trivial approximation algorithm may just give the useless answer every time
The engineering challenge is to give the useful answer often enough to fuel the client application
This is the hard (and fun) part of static analysis...
Engineering in practice (1/4)
Where do the pointers come from?
Engineering in practice (2/4)
The trivial answer: from somewhere!
Engineering in practice (3/4)
The hard answer: from a few places!
Engineering in practice (4/4)
Over the last 15 years: ≥ 500 publications, ≥ 50 PhD theses
Bug finding
int main() { char *p,*q; p = NULL; printf("%s",p); q = (char *)malloc(100); p = q; free(q); *p = 'x'; free(p); p = (char *)malloc(100); p = (char *)malloc(100); q = p; strcat(p,q); }
Industrialization
Fortify (2003, USA)• Merged into HP, 2010
Polyspace(1999, France)• Abstract interpretation based tool.
Sparrow(2007, Korea)
…
Buffer overrun, null dereference, use after free, double free, free non-heap variable, memory leaks, divide by zero, …
Q & A
Top Related