Adaptive static analysis via learning with Bayesian optimization

H. E.O. Kihong, Hakjoo Oh, Hongseok Yang, Y. I. Kwangkeun

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Building a cost-effective static analyzer for real-world programs is still regarded an art. One key contributor to this grim reputation is the difficulty in balancing the cost and the precision of an analyzer. An ideal analyzer should be adaptive to a given analysis task and avoid using techniques that unnecessarily improve precision and increase analysis cost. However, achieving this ideal is highly nontrivial, and it requires a large amount of engineering efforts. In this article, we present a new learning-based approach for adaptive static analysis. In our approach, the analysis includes a sophisticated parameterized strategy that decides, for each part of a given program, whether to apply a precision-improving technique to that part or not. We present a method for learning a good parameter for such a strategy from an existing codebase via Bayesian optimization. The learnt strategy is then used for new, unseen programs. Using our approach, we developed partially flow- and context-sensitive variants of a realistic C static analyzer. The experimental results demonstrate that using Bayesian optimization is crucial for learning from an existing codebase. Also, they show that among all program queries that require flow- or context-sensitivity, our partially flow- and context-sensitive analysis answers 75% of them, while increasing the analysis cost only by 3.3× of the baseline flow- and context-insensitive analysis, rather than 40× or more of the fully sensitive version.

Original languageEnglish
Article number14
JournalACM Transactions on Programming Languages and Systems
Volume40
Issue number4
DOIs
Publication statusPublished - 2018 Nov 1

Fingerprint

Static analysis
Costs

Keywords

  • Bayesian optimization
  • Data-driven program analysis
  • Static program analysis

ASJC Scopus subject areas

  • Software

Cite this

Adaptive static analysis via learning with Bayesian optimization. / Kihong, H. E.O.; Oh, Hakjoo; Yang, Hongseok; Kwangkeun, Y. I.

In: ACM Transactions on Programming Languages and Systems, Vol. 40, No. 4, 14, 01.11.2018.

Research output: Contribution to journalArticle

@article{66a00bc380174108974606920478358d,
title = "Adaptive static analysis via learning with Bayesian optimization",
abstract = "Building a cost-effective static analyzer for real-world programs is still regarded an art. One key contributor to this grim reputation is the difficulty in balancing the cost and the precision of an analyzer. An ideal analyzer should be adaptive to a given analysis task and avoid using techniques that unnecessarily improve precision and increase analysis cost. However, achieving this ideal is highly nontrivial, and it requires a large amount of engineering efforts. In this article, we present a new learning-based approach for adaptive static analysis. In our approach, the analysis includes a sophisticated parameterized strategy that decides, for each part of a given program, whether to apply a precision-improving technique to that part or not. We present a method for learning a good parameter for such a strategy from an existing codebase via Bayesian optimization. The learnt strategy is then used for new, unseen programs. Using our approach, we developed partially flow- and context-sensitive variants of a realistic C static analyzer. The experimental results demonstrate that using Bayesian optimization is crucial for learning from an existing codebase. Also, they show that among all program queries that require flow- or context-sensitivity, our partially flow- and context-sensitive analysis answers 75{\%} of them, while increasing the analysis cost only by 3.3× of the baseline flow- and context-insensitive analysis, rather than 40× or more of the fully sensitive version.",
keywords = "Bayesian optimization, Data-driven program analysis, Static program analysis",
author = "Kihong, {H. E.O.} and Hakjoo Oh and Hongseok Yang and Kwangkeun, {Y. I.}",
year = "2018",
month = "11",
day = "1",
doi = "10.1145/3121135",
language = "English",
volume = "40",
journal = "ACM Transactions on Programming Languages and Systems",
issn = "0164-0925",
publisher = "Association for Computing Machinery (ACM)",
number = "4",

}

TY - JOUR

T1 - Adaptive static analysis via learning with Bayesian optimization

AU - Kihong, H. E.O.

AU - Oh, Hakjoo

AU - Yang, Hongseok

AU - Kwangkeun, Y. I.

PY - 2018/11/1

Y1 - 2018/11/1

N2 - Building a cost-effective static analyzer for real-world programs is still regarded an art. One key contributor to this grim reputation is the difficulty in balancing the cost and the precision of an analyzer. An ideal analyzer should be adaptive to a given analysis task and avoid using techniques that unnecessarily improve precision and increase analysis cost. However, achieving this ideal is highly nontrivial, and it requires a large amount of engineering efforts. In this article, we present a new learning-based approach for adaptive static analysis. In our approach, the analysis includes a sophisticated parameterized strategy that decides, for each part of a given program, whether to apply a precision-improving technique to that part or not. We present a method for learning a good parameter for such a strategy from an existing codebase via Bayesian optimization. The learnt strategy is then used for new, unseen programs. Using our approach, we developed partially flow- and context-sensitive variants of a realistic C static analyzer. The experimental results demonstrate that using Bayesian optimization is crucial for learning from an existing codebase. Also, they show that among all program queries that require flow- or context-sensitivity, our partially flow- and context-sensitive analysis answers 75% of them, while increasing the analysis cost only by 3.3× of the baseline flow- and context-insensitive analysis, rather than 40× or more of the fully sensitive version.

AB - Building a cost-effective static analyzer for real-world programs is still regarded an art. One key contributor to this grim reputation is the difficulty in balancing the cost and the precision of an analyzer. An ideal analyzer should be adaptive to a given analysis task and avoid using techniques that unnecessarily improve precision and increase analysis cost. However, achieving this ideal is highly nontrivial, and it requires a large amount of engineering efforts. In this article, we present a new learning-based approach for adaptive static analysis. In our approach, the analysis includes a sophisticated parameterized strategy that decides, for each part of a given program, whether to apply a precision-improving technique to that part or not. We present a method for learning a good parameter for such a strategy from an existing codebase via Bayesian optimization. The learnt strategy is then used for new, unseen programs. Using our approach, we developed partially flow- and context-sensitive variants of a realistic C static analyzer. The experimental results demonstrate that using Bayesian optimization is crucial for learning from an existing codebase. Also, they show that among all program queries that require flow- or context-sensitivity, our partially flow- and context-sensitive analysis answers 75% of them, while increasing the analysis cost only by 3.3× of the baseline flow- and context-insensitive analysis, rather than 40× or more of the fully sensitive version.

KW - Bayesian optimization

KW - Data-driven program analysis

KW - Static program analysis

UR - http://www.scopus.com/inward/record.url?scp=85057210497&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85057210497&partnerID=8YFLogxK

U2 - 10.1145/3121135

DO - 10.1145/3121135

M3 - Article

AN - SCOPUS:85057210497

VL - 40

JO - ACM Transactions on Programming Languages and Systems

JF - ACM Transactions on Programming Languages and Systems

SN - 0164-0925

IS - 4

M1 - 14

ER -