Learning analysis strategies for octagon and context sensitivity from labeled data generated by static analyses

Kihong Heo, Hakjoo Oh, Hongseok Yang

Research output: Contribution to journalArticle

Abstract

We present a method for automatically learning an effective strategy for clustering variables for the Octagon analysis from a given codebase. This learned strategy works as a preprocessor of Octagon. Given a program to be analyzed, the strategy is first applied to the program and clusters variables in it. We then run a partial variant of the Octagon analysis that tracks relationships among variables within the same cluster, but not across different clusters. The notable aspect of our learning method is that although the method is based on supervised learning, it does not require manually-labeled data. The method does not ask human to indicate which pairs of program variables in the given codebase should be tracked. Instead it uses the impact pre-analysis for Octagon from our previous work and automatically labels variable pairs in the codebase as positive or negative. We implemented our method on top of a static buffer-overflow detector for C programs and tested it against open source benchmarks. Our experiments show that the partial Octagon analysis with the learned strategy scales up to 100KLOC and is 33× faster than the one with the impact pre-analysis (which itself is significantly faster than the original Octagon analysis), while increasing false alarms by only 2%. The general idea behind our methodis applicable to other types of static analyses as well. We demonstrate that our method is also effective to learn a strategy for context-sensitivity of interval analysis.

Original languageEnglish
Pages (from-to)189-220
Number of pages32
JournalFormal Methods in System Design
Volume53
Issue number2
DOIs
Publication statusPublished - 2018 Oct 1

Fingerprint

Octagon
Supervised learning
Labels
Detectors
Experiments
Buffer Overflow
Partial
Interval Analysis
Scale-up
False Alarm
Supervised Learning
Open Source
Learning
Strategy
Context
Detector
Clustering
Benchmark
Demonstrate
Experiment

Keywords

  • Context-sensitivity
  • Machine learning
  • Relational analysis
  • Static analysis

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture

Cite this

Learning analysis strategies for octagon and context sensitivity from labeled data generated by static analyses. / Heo, Kihong; Oh, Hakjoo; Yang, Hongseok.

In: Formal Methods in System Design, Vol. 53, No. 2, 01.10.2018, p. 189-220.

Research output: Contribution to journalArticle

@article{29da0ca8c4da4132ab4e731599770648,
title = "Learning analysis strategies for octagon and context sensitivity from labeled data generated by static analyses",
abstract = "We present a method for automatically learning an effective strategy for clustering variables for the Octagon analysis from a given codebase. This learned strategy works as a preprocessor of Octagon. Given a program to be analyzed, the strategy is first applied to the program and clusters variables in it. We then run a partial variant of the Octagon analysis that tracks relationships among variables within the same cluster, but not across different clusters. The notable aspect of our learning method is that although the method is based on supervised learning, it does not require manually-labeled data. The method does not ask human to indicate which pairs of program variables in the given codebase should be tracked. Instead it uses the impact pre-analysis for Octagon from our previous work and automatically labels variable pairs in the codebase as positive or negative. We implemented our method on top of a static buffer-overflow detector for C programs and tested it against open source benchmarks. Our experiments show that the partial Octagon analysis with the learned strategy scales up to 100KLOC and is 33× faster than the one with the impact pre-analysis (which itself is significantly faster than the original Octagon analysis), while increasing false alarms by only 2{\%}. The general idea behind our methodis applicable to other types of static analyses as well. We demonstrate that our method is also effective to learn a strategy for context-sensitivity of interval analysis.",
keywords = "Context-sensitivity, Machine learning, Relational analysis, Static analysis",
author = "Kihong Heo and Hakjoo Oh and Hongseok Yang",
year = "2018",
month = "10",
day = "1",
doi = "10.1007/s10703-017-0306-7",
language = "English",
volume = "53",
pages = "189--220",
journal = "Formal Methods in System Design",
issn = "0925-9856",
publisher = "Springer Netherlands",
number = "2",

}

TY - JOUR

T1 - Learning analysis strategies for octagon and context sensitivity from labeled data generated by static analyses

AU - Heo, Kihong

AU - Oh, Hakjoo

AU - Yang, Hongseok

PY - 2018/10/1

Y1 - 2018/10/1

N2 - We present a method for automatically learning an effective strategy for clustering variables for the Octagon analysis from a given codebase. This learned strategy works as a preprocessor of Octagon. Given a program to be analyzed, the strategy is first applied to the program and clusters variables in it. We then run a partial variant of the Octagon analysis that tracks relationships among variables within the same cluster, but not across different clusters. The notable aspect of our learning method is that although the method is based on supervised learning, it does not require manually-labeled data. The method does not ask human to indicate which pairs of program variables in the given codebase should be tracked. Instead it uses the impact pre-analysis for Octagon from our previous work and automatically labels variable pairs in the codebase as positive or negative. We implemented our method on top of a static buffer-overflow detector for C programs and tested it against open source benchmarks. Our experiments show that the partial Octagon analysis with the learned strategy scales up to 100KLOC and is 33× faster than the one with the impact pre-analysis (which itself is significantly faster than the original Octagon analysis), while increasing false alarms by only 2%. The general idea behind our methodis applicable to other types of static analyses as well. We demonstrate that our method is also effective to learn a strategy for context-sensitivity of interval analysis.

AB - We present a method for automatically learning an effective strategy for clustering variables for the Octagon analysis from a given codebase. This learned strategy works as a preprocessor of Octagon. Given a program to be analyzed, the strategy is first applied to the program and clusters variables in it. We then run a partial variant of the Octagon analysis that tracks relationships among variables within the same cluster, but not across different clusters. The notable aspect of our learning method is that although the method is based on supervised learning, it does not require manually-labeled data. The method does not ask human to indicate which pairs of program variables in the given codebase should be tracked. Instead it uses the impact pre-analysis for Octagon from our previous work and automatically labels variable pairs in the codebase as positive or negative. We implemented our method on top of a static buffer-overflow detector for C programs and tested it against open source benchmarks. Our experiments show that the partial Octagon analysis with the learned strategy scales up to 100KLOC and is 33× faster than the one with the impact pre-analysis (which itself is significantly faster than the original Octagon analysis), while increasing false alarms by only 2%. The general idea behind our methodis applicable to other types of static analyses as well. We demonstrate that our method is also effective to learn a strategy for context-sensitivity of interval analysis.

KW - Context-sensitivity

KW - Machine learning

KW - Relational analysis

KW - Static analysis

UR - http://www.scopus.com/inward/record.url?scp=85034626614&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85034626614&partnerID=8YFLogxK

U2 - 10.1007/s10703-017-0306-7

DO - 10.1007/s10703-017-0306-7

M3 - Article

AN - SCOPUS:85034626614

VL - 53

SP - 189

EP - 220

JO - Formal Methods in System Design

JF - Formal Methods in System Design

SN - 0925-9856

IS - 2

ER -