TY - GEN
T1 - Learning a variable-clustering strategy for octagon from labeled data generated by a static analysis
AU - Heo, Kihong
AU - Oh, Hakjoo
AU - Yang, Hongseok
N1 - Publisher Copyright:
© Springer-Verlag GmbH Germany 2016.
PY - 2016
Y1 - 2016
N2 - We present a method for automatically learning an effective strategy for clustering variables for the Octagon analysis from a given codebase. This learned strategy works as a preprocessor of Octagon. Given a program to be analyzed, the strategy is first applied to the program and clusters variables in it. We then run a partial variant of the Octagon analysis that tracks relationships among variables within the same cluster, but not across different clusters. The notable aspect of our learning method is that although the method is based on supervised learning, it does not require manually-labeled data. The method does not ask human to indicate which pairs of program variables in the given codebase should be tracked. Instead it uses the impact pre-analysis for Octagon from our previous work and automatically labels variable pairs in the codebase as positive or negative. We implemented our method on top of a static buffer-overflow detector for C programs and tested it against open source benchmarks. Our experiments show that the partial Octagon analysis with the learned strategy scales up to 100KLOC and is 33x faster than the one with the impact pre-analysis (which itself is significantly faster than the original Octagon analysis), while increasing false alarms by only 2 %.
AB - We present a method for automatically learning an effective strategy for clustering variables for the Octagon analysis from a given codebase. This learned strategy works as a preprocessor of Octagon. Given a program to be analyzed, the strategy is first applied to the program and clusters variables in it. We then run a partial variant of the Octagon analysis that tracks relationships among variables within the same cluster, but not across different clusters. The notable aspect of our learning method is that although the method is based on supervised learning, it does not require manually-labeled data. The method does not ask human to indicate which pairs of program variables in the given codebase should be tracked. Instead it uses the impact pre-analysis for Octagon from our previous work and automatically labels variable pairs in the codebase as positive or negative. We implemented our method on top of a static buffer-overflow detector for C programs and tested it against open source benchmarks. Our experiments show that the partial Octagon analysis with the learned strategy scales up to 100KLOC and is 33x faster than the one with the impact pre-analysis (which itself is significantly faster than the original Octagon analysis), while increasing false alarms by only 2 %.
UR - http://www.scopus.com/inward/record.url?scp=84988423856&partnerID=8YFLogxK
U2 - 10.1007/978-3-662-53413-7_12
DO - 10.1007/978-3-662-53413-7_12
M3 - Conference contribution
AN - SCOPUS:84988423856
SN - 9783662534120
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 237
EP - 256
BT - Static Analysis - 23rd International Symposium, SAS 2016, Proceedings
A2 - Rival, Xavier
PB - Springer Verlag
T2 - 23rd International Symposium on Static Analysis, SAS 2016
Y2 - 8 September 2016 through 10 September 2016
ER -