A machine-learning algorithm with disjunctive model for data-driven program analysis

Minseok Jeon, Sehun Jeong, Sungdeok Cha, Hakjoo Oh

Research output: Contribution to journalArticle

Abstract

We present a new machine-learning algorithm with disjunctive model for data-driven program analysis. One major challenge in static program analysis is a substantial amount of manual effort required for tuning the analysis performance. Recently, data-driven program analysis has emerged to address this challenge by automatically adjusting the analysis based on data through a learning algorithm. Although this new approach has proven promising for various program analysis tasks, its effectiveness has been limited due to simpleminded learning models and algorithms that are unable to capture sophisticated, in particular disjunctive, program properties. To overcome this shortcoming, this article presents a new disjunctive model for datadriven program analysis aswell as a learning algorithm to find the model parameters. Ourmodel uses Boolean formulas over atomic features and therefore is able to express nonlinear combinations of program properties. A key technical challenge is to efficiently determine a set of good Boolean formulas, as brute-force search would simply be impractical. We present a stepwise and greedy algorithm that efficiently learns Boolean formulas. We show the effectiveness and generality of our algorithm with two static analyzers: Contextsensitive points-to analysis for Java and flow-sensitive interval analysis for C. Experimental results show that our automated technique significantly improves the performance of the state-of-the-art techniques including ones hand-crafted by human experts.

Original languageEnglish
Article number13
JournalACM Transactions on Programming Languages and Systems
Volume41
Issue number2
DOIs
Publication statusPublished - 2019 Jun

Fingerprint

Learning algorithms
Learning systems
Tuning

Keywords

  • Context-sensitivity
  • Data-driven program analysis
  • Flow-sensitivity
  • Static analysis

ASJC Scopus subject areas

  • Software

Cite this

A machine-learning algorithm with disjunctive model for data-driven program analysis. / Jeon, Minseok; Jeong, Sehun; Cha, Sungdeok; Oh, Hakjoo.

In: ACM Transactions on Programming Languages and Systems, Vol. 41, No. 2, 13, 06.2019.

Research output: Contribution to journalArticle

@article{e9e9aaf9f8c745c0a9834800897d6cbe,
title = "A machine-learning algorithm with disjunctive model for data-driven program analysis",
abstract = "We present a new machine-learning algorithm with disjunctive model for data-driven program analysis. One major challenge in static program analysis is a substantial amount of manual effort required for tuning the analysis performance. Recently, data-driven program analysis has emerged to address this challenge by automatically adjusting the analysis based on data through a learning algorithm. Although this new approach has proven promising for various program analysis tasks, its effectiveness has been limited due to simpleminded learning models and algorithms that are unable to capture sophisticated, in particular disjunctive, program properties. To overcome this shortcoming, this article presents a new disjunctive model for datadriven program analysis aswell as a learning algorithm to find the model parameters. Ourmodel uses Boolean formulas over atomic features and therefore is able to express nonlinear combinations of program properties. A key technical challenge is to efficiently determine a set of good Boolean formulas, as brute-force search would simply be impractical. We present a stepwise and greedy algorithm that efficiently learns Boolean formulas. We show the effectiveness and generality of our algorithm with two static analyzers: Contextsensitive points-to analysis for Java and flow-sensitive interval analysis for C. Experimental results show that our automated technique significantly improves the performance of the state-of-the-art techniques including ones hand-crafted by human experts.",
keywords = "Context-sensitivity, Data-driven program analysis, Flow-sensitivity, Static analysis",
author = "Minseok Jeon and Sehun Jeong and Sungdeok Cha and Hakjoo Oh",
year = "2019",
month = "6",
doi = "10.1145/3293607",
language = "English",
volume = "41",
journal = "ACM Transactions on Programming Languages and Systems",
issn = "0164-0925",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

TY - JOUR

T1 - A machine-learning algorithm with disjunctive model for data-driven program analysis

AU - Jeon, Minseok

AU - Jeong, Sehun

AU - Cha, Sungdeok

AU - Oh, Hakjoo

PY - 2019/6

Y1 - 2019/6

N2 - We present a new machine-learning algorithm with disjunctive model for data-driven program analysis. One major challenge in static program analysis is a substantial amount of manual effort required for tuning the analysis performance. Recently, data-driven program analysis has emerged to address this challenge by automatically adjusting the analysis based on data through a learning algorithm. Although this new approach has proven promising for various program analysis tasks, its effectiveness has been limited due to simpleminded learning models and algorithms that are unable to capture sophisticated, in particular disjunctive, program properties. To overcome this shortcoming, this article presents a new disjunctive model for datadriven program analysis aswell as a learning algorithm to find the model parameters. Ourmodel uses Boolean formulas over atomic features and therefore is able to express nonlinear combinations of program properties. A key technical challenge is to efficiently determine a set of good Boolean formulas, as brute-force search would simply be impractical. We present a stepwise and greedy algorithm that efficiently learns Boolean formulas. We show the effectiveness and generality of our algorithm with two static analyzers: Contextsensitive points-to analysis for Java and flow-sensitive interval analysis for C. Experimental results show that our automated technique significantly improves the performance of the state-of-the-art techniques including ones hand-crafted by human experts.

AB - We present a new machine-learning algorithm with disjunctive model for data-driven program analysis. One major challenge in static program analysis is a substantial amount of manual effort required for tuning the analysis performance. Recently, data-driven program analysis has emerged to address this challenge by automatically adjusting the analysis based on data through a learning algorithm. Although this new approach has proven promising for various program analysis tasks, its effectiveness has been limited due to simpleminded learning models and algorithms that are unable to capture sophisticated, in particular disjunctive, program properties. To overcome this shortcoming, this article presents a new disjunctive model for datadriven program analysis aswell as a learning algorithm to find the model parameters. Ourmodel uses Boolean formulas over atomic features and therefore is able to express nonlinear combinations of program properties. A key technical challenge is to efficiently determine a set of good Boolean formulas, as brute-force search would simply be impractical. We present a stepwise and greedy algorithm that efficiently learns Boolean formulas. We show the effectiveness and generality of our algorithm with two static analyzers: Contextsensitive points-to analysis for Java and flow-sensitive interval analysis for C. Experimental results show that our automated technique significantly improves the performance of the state-of-the-art techniques including ones hand-crafted by human experts.

KW - Context-sensitivity

KW - Data-driven program analysis

KW - Flow-sensitivity

KW - Static analysis

UR - http://www.scopus.com/inward/record.url?scp=85075699476&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85075699476&partnerID=8YFLogxK

U2 - 10.1145/3293607

DO - 10.1145/3293607

M3 - Article

AN - SCOPUS:85075699476

VL - 41

JO - ACM Transactions on Programming Languages and Systems

JF - ACM Transactions on Programming Languages and Systems

SN - 0164-0925

IS - 2

M1 - 13

ER -