A Memory Model for Static Analysis of C Programs [chapter]

Zhongxing Xu, Ted Kremenek, Jian Zhang
2010 Lecture Notes in Computer Science  
Automatic bug finding with static analysis requires precise tracking of different memory object values. This paper describes a memory modeling method for static analysis of C programs. It is particularly suitable for precise path-sensitive analyses, e.g., symbolic execution. It can handle almost all kinds of C expressions, including arbitrary levels of pointer dereferences, pointer arithmetic, composite array and struct data types, arbitrary type casts, dynamic memory allocation, etc. It maps
more » ... iased lvalue expressions to the identical object without extra alias analysis. The model has been implemented in the Clang static analyzer and enhanced the analyzer a lot by enabling it to have precise value tracking ability. Recently there has been a large number of works on bug finding with symbolic execution technique. In these works, tracking values of different memory objects along a single path is a common requirement. Some works get the run-time addresses of memory objects by actually compiling and running the program [1][3]. These are dynamic techniques. Programs being checked must be instrumented and linked with an auxiliary library and run. Other works solve the memory object identifying problem through various static ways. The simplest approach is to only track simple variables with names, and ignore multi-level pointers, array elements, and struct fields. This would surely sacrifice much analysis power. This paper proposes a memory modeling method that is particularly suitable for symbolic execution of C programs. It enables the symbolic execution to identify and track each memory object precisely. We give algorithms that enable the mapping from C l-value expressions to memory objects during the analysis. Thus no separate alias analysis is required. Memory model is the way that the analysis tool models the storage of the underlying machine on which the code runs. It is the basis of language semantics simulation and a key component of static code analysis tools.
doi:10.1007/978-3-642-16558-0_44 fatcat:f6bmycwjjbearmw55n2tu33xh4