TY - JOUR
T1 - Adaptive static analysis via learning with Bayesian optimization
AU - Kihong, H. E.O.
AU - Hakjoo, O. H.
AU - Yang, Hongseok
AU - Kwangkeun, Y. I.
N1 - Funding Information:
This work was supported by Samsung Research Funding & Incubation Center of Samsung Electronics under Project Number SRFC-IT1701-09. This work was also supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government(MSIT) (Grant No.2017-0-00184, Self-Learning Cyber Immune Technology Development). This work was partly supported by Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (Grant No. B0717-16-0098). Authors’ addresses: H. Oh (corresponding author), Room 616c, Science Library Bldg, College of Informatics, Korea University, Anam-dong 5-ga, Seongbuk-gu, Seoul 136-713, Korea; email: hakjoo_oh@korea.ac.kr. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2018 Association for Computing Machinery. 0164-0925/2018/11-ART14 $15.00 https://doi.org/10.1145/3121135
Publisher Copyright:
© 2018 Association for Computing Machinery.
PY - 2018/11
Y1 - 2018/11
N2 - Building a cost-effective static analyzer for real-world programs is still regarded an art. One key contributor to this grim reputation is the difficulty in balancing the cost and the precision of an analyzer. An ideal analyzer should be adaptive to a given analysis task and avoid using techniques that unnecessarily improve precision and increase analysis cost. However, achieving this ideal is highly nontrivial, and it requires a large amount of engineering efforts. In this article, we present a new learning-based approach for adaptive static analysis. In our approach, the analysis includes a sophisticated parameterized strategy that decides, for each part of a given program, whether to apply a precision-improving technique to that part or not. We present a method for learning a good parameter for such a strategy from an existing codebase via Bayesian optimization. The learnt strategy is then used for new, unseen programs. Using our approach, we developed partially flow- and context-sensitive variants of a realistic C static analyzer. The experimental results demonstrate that using Bayesian optimization is crucial for learning from an existing codebase. Also, they show that among all program queries that require flow- or context-sensitivity, our partially flow- and context-sensitive analysis answers 75% of them, while increasing the analysis cost only by 3.3× of the baseline flow- and context-insensitive analysis, rather than 40× or more of the fully sensitive version.
AB - Building a cost-effective static analyzer for real-world programs is still regarded an art. One key contributor to this grim reputation is the difficulty in balancing the cost and the precision of an analyzer. An ideal analyzer should be adaptive to a given analysis task and avoid using techniques that unnecessarily improve precision and increase analysis cost. However, achieving this ideal is highly nontrivial, and it requires a large amount of engineering efforts. In this article, we present a new learning-based approach for adaptive static analysis. In our approach, the analysis includes a sophisticated parameterized strategy that decides, for each part of a given program, whether to apply a precision-improving technique to that part or not. We present a method for learning a good parameter for such a strategy from an existing codebase via Bayesian optimization. The learnt strategy is then used for new, unseen programs. Using our approach, we developed partially flow- and context-sensitive variants of a realistic C static analyzer. The experimental results demonstrate that using Bayesian optimization is crucial for learning from an existing codebase. Also, they show that among all program queries that require flow- or context-sensitivity, our partially flow- and context-sensitive analysis answers 75% of them, while increasing the analysis cost only by 3.3× of the baseline flow- and context-insensitive analysis, rather than 40× or more of the fully sensitive version.
KW - Bayesian optimization
KW - Data-driven program analysis
KW - Static program analysis
UR - http://www.scopus.com/inward/record.url?scp=85057210497&partnerID=8YFLogxK
U2 - 10.1145/3121135
DO - 10.1145/3121135
M3 - Article
AN - SCOPUS:85057210497
SN - 0164-0925
VL - 40
JO - ACM Transactions on Programming Languages and Systems
JF - ACM Transactions on Programming Languages and Systems
IS - 4
M1 - 14
ER -