End-to-end prediction of buffer overruns from raw source code via neural memory networks

Min Je Choi, Sehun Jeong, Hakjoo Oh, Jaegul Choo

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Detecting buffer overruns from a source code is one of the most common and yet challenging tasks in program analysis. Current approaches based on rigid rules and handcrafted features are limited in terms of flexible applicability and robustness due to diverse bug patterns and characteristics existing in sophisticated real-world software programs. In this paper, we propose a novel, datadriven approach that is completely end-to-end without requiring any hand-crafted features, thus free from any program language-specific structural limitations. In particular, our approach leverages a recently proposed neural network model called memory networks that have shown the state-of-the-art performances mainly in question-answering tasks. Our experimental results using source code samples demonstrate that our proposed model is capable of accurately detecting different types of buffer overruns. We also present in-depth analyses on how a memory network can learn to understand the semantics in programming languages solely from raw source codes, such as tracing variables of interest, identifying numerical values, and performing their quantitative comparisons.

Original languageEnglish
Title of host publication26th International Joint Conference on Artificial Intelligence, IJCAI 2017
EditorsCarles Sierra
PublisherInternational Joint Conferences on Artificial Intelligence
Pages1546-1553
Number of pages8
ISBN (Electronic)9780999241103
Publication statusPublished - 2017 Jan 1
Event26th International Joint Conference on Artificial Intelligence, IJCAI 2017 - Melbourne, Australia
Duration: 2017 Aug 192017 Aug 25

Publication series

NameIJCAI International Joint Conference on Artificial Intelligence
ISSN (Print)1045-0823

Other

Other26th International Joint Conference on Artificial Intelligence, IJCAI 2017
CountryAustralia
CityMelbourne
Period17/8/1917/8/25

Fingerprint

Data storage equipment
Computer programming languages
Semantics
Neural networks

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Choi, M. J., Jeong, S., Oh, H., & Choo, J. (2017). End-to-end prediction of buffer overruns from raw source code via neural memory networks. In C. Sierra (Ed.), 26th International Joint Conference on Artificial Intelligence, IJCAI 2017 (pp. 1546-1553). (IJCAI International Joint Conference on Artificial Intelligence). International Joint Conferences on Artificial Intelligence.

End-to-end prediction of buffer overruns from raw source code via neural memory networks. / Choi, Min Je; Jeong, Sehun; Oh, Hakjoo; Choo, Jaegul.

26th International Joint Conference on Artificial Intelligence, IJCAI 2017. ed. / Carles Sierra. International Joint Conferences on Artificial Intelligence, 2017. p. 1546-1553 (IJCAI International Joint Conference on Artificial Intelligence).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Choi, MJ, Jeong, S, Oh, H & Choo, J 2017, End-to-end prediction of buffer overruns from raw source code via neural memory networks. in C Sierra (ed.), 26th International Joint Conference on Artificial Intelligence, IJCAI 2017. IJCAI International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence, pp. 1546-1553, 26th International Joint Conference on Artificial Intelligence, IJCAI 2017, Melbourne, Australia, 17/8/19.
Choi MJ, Jeong S, Oh H, Choo J. End-to-end prediction of buffer overruns from raw source code via neural memory networks. In Sierra C, editor, 26th International Joint Conference on Artificial Intelligence, IJCAI 2017. International Joint Conferences on Artificial Intelligence. 2017. p. 1546-1553. (IJCAI International Joint Conference on Artificial Intelligence).
Choi, Min Je ; Jeong, Sehun ; Oh, Hakjoo ; Choo, Jaegul. / End-to-end prediction of buffer overruns from raw source code via neural memory networks. 26th International Joint Conference on Artificial Intelligence, IJCAI 2017. editor / Carles Sierra. International Joint Conferences on Artificial Intelligence, 2017. pp. 1546-1553 (IJCAI International Joint Conference on Artificial Intelligence).
@inproceedings{b1a94d3ccc934166a5e89abfd3183aba,
title = "End-to-end prediction of buffer overruns from raw source code via neural memory networks",
abstract = "Detecting buffer overruns from a source code is one of the most common and yet challenging tasks in program analysis. Current approaches based on rigid rules and handcrafted features are limited in terms of flexible applicability and robustness due to diverse bug patterns and characteristics existing in sophisticated real-world software programs. In this paper, we propose a novel, datadriven approach that is completely end-to-end without requiring any hand-crafted features, thus free from any program language-specific structural limitations. In particular, our approach leverages a recently proposed neural network model called memory networks that have shown the state-of-the-art performances mainly in question-answering tasks. Our experimental results using source code samples demonstrate that our proposed model is capable of accurately detecting different types of buffer overruns. We also present in-depth analyses on how a memory network can learn to understand the semantics in programming languages solely from raw source codes, such as tracing variables of interest, identifying numerical values, and performing their quantitative comparisons.",
author = "Choi, {Min Je} and Sehun Jeong and Hakjoo Oh and Jaegul Choo",
year = "2017",
month = "1",
day = "1",
language = "English",
series = "IJCAI International Joint Conference on Artificial Intelligence",
publisher = "International Joint Conferences on Artificial Intelligence",
pages = "1546--1553",
editor = "Carles Sierra",
booktitle = "26th International Joint Conference on Artificial Intelligence, IJCAI 2017",

}

TY - GEN

T1 - End-to-end prediction of buffer overruns from raw source code via neural memory networks

AU - Choi, Min Je

AU - Jeong, Sehun

AU - Oh, Hakjoo

AU - Choo, Jaegul

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Detecting buffer overruns from a source code is one of the most common and yet challenging tasks in program analysis. Current approaches based on rigid rules and handcrafted features are limited in terms of flexible applicability and robustness due to diverse bug patterns and characteristics existing in sophisticated real-world software programs. In this paper, we propose a novel, datadriven approach that is completely end-to-end without requiring any hand-crafted features, thus free from any program language-specific structural limitations. In particular, our approach leverages a recently proposed neural network model called memory networks that have shown the state-of-the-art performances mainly in question-answering tasks. Our experimental results using source code samples demonstrate that our proposed model is capable of accurately detecting different types of buffer overruns. We also present in-depth analyses on how a memory network can learn to understand the semantics in programming languages solely from raw source codes, such as tracing variables of interest, identifying numerical values, and performing their quantitative comparisons.

AB - Detecting buffer overruns from a source code is one of the most common and yet challenging tasks in program analysis. Current approaches based on rigid rules and handcrafted features are limited in terms of flexible applicability and robustness due to diverse bug patterns and characteristics existing in sophisticated real-world software programs. In this paper, we propose a novel, datadriven approach that is completely end-to-end without requiring any hand-crafted features, thus free from any program language-specific structural limitations. In particular, our approach leverages a recently proposed neural network model called memory networks that have shown the state-of-the-art performances mainly in question-answering tasks. Our experimental results using source code samples demonstrate that our proposed model is capable of accurately detecting different types of buffer overruns. We also present in-depth analyses on how a memory network can learn to understand the semantics in programming languages solely from raw source codes, such as tracing variables of interest, identifying numerical values, and performing their quantitative comparisons.

UR - http://www.scopus.com/inward/record.url?scp=85031926400&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85031926400&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85031926400

T3 - IJCAI International Joint Conference on Artificial Intelligence

SP - 1546

EP - 1553

BT - 26th International Joint Conference on Artificial Intelligence, IJCAI 2017

A2 - Sierra, Carles

PB - International Joint Conferences on Artificial Intelligence

ER -