Data-mining based SQL injection attack detection using internal query trees

Mi Yeon Kim, Dong Hoon Lee

Research output: Contribution to journalArticle

23 Citations (Scopus)

Abstract

Detecting SQL injection attacks (SQLIAs) is becoming increasingly important in database-driven web sites. Until now, most of the studies on SQLIA detection have focused on the structured query language (SQL) structure at the application level. Unfortunately, this approach inevitably fails to detect those attacks that use already stored procedure and data within the database system. In this paper, we propose a framework to detect SQLIAs at database level by using SVM classification and various kernel functions. The key issue of SQLIA detection framework is how to represent the internal query tree collected from database log suitable for SVM classification algorithm in order to acquire good performance in detecting SQLIAs. To solve the issue, we first propose a novel method to convert the query tree into an n-dimensional feature vector by using a multi-dimensional sequence as an intermediate representation. The reason that it is difficult to directly convert the query tree into an n-dimensional feature vector is the complexity and variability of the query tree structure. Second, we propose a method to extract the syntactic features, as well as the semantic features when generating feature vector. Third, we propose a method to transform string feature values into numeric feature values, combining multiple statistical models. The combined model maps one string value to one numeric value by containing the multiple characteristic of each string value. In order to demonstrate the feasibility of our proposals in practical environments, we implement the SQLIA detection system based on PostgreSQL, a popular open source database system, and we perform experiments. The experimental results using the internal query trees of PostgreSQL validate that our proposal is effective in detecting SQLIAs, with at least 99.6% of the probability that the probability for malicious queries to be correctly predicted as SQLIA is greater than the probability for normal queries to be incorrectly predicted as SQLIA. Finally, we perform additional experiments to compare our proposal with syntax-focused feature extraction and single statistical model based on feature transformation. The experimental results show that our proposal significantly increases the probability of correctly detecting SQLIAs for various SQL statements, when compared to the previous methods.

Original languageEnglish
Pages (from-to)5416-5430
Number of pages15
JournalExpert Systems with Applications
Volume41
Issue number11
DOIs
Publication statusPublished - 2014 Sep 1

Fingerprint

Query languages
Data mining
Syntactics
Feature extraction
Websites
Experiments
Semantics
Statistical Models

Keywords

  • Data mining
  • Database
  • Intrusion detection
  • SQL injection attack
  • SVM

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Engineering(all)

Cite this

Data-mining based SQL injection attack detection using internal query trees. / Kim, Mi Yeon; Lee, Dong Hoon.

In: Expert Systems with Applications, Vol. 41, No. 11, 01.09.2014, p. 5416-5430.

Research output: Contribution to journalArticle

@article{73e5b66e762140e78199ffe7be9d6f24,
title = "Data-mining based SQL injection attack detection using internal query trees",
abstract = "Detecting SQL injection attacks (SQLIAs) is becoming increasingly important in database-driven web sites. Until now, most of the studies on SQLIA detection have focused on the structured query language (SQL) structure at the application level. Unfortunately, this approach inevitably fails to detect those attacks that use already stored procedure and data within the database system. In this paper, we propose a framework to detect SQLIAs at database level by using SVM classification and various kernel functions. The key issue of SQLIA detection framework is how to represent the internal query tree collected from database log suitable for SVM classification algorithm in order to acquire good performance in detecting SQLIAs. To solve the issue, we first propose a novel method to convert the query tree into an n-dimensional feature vector by using a multi-dimensional sequence as an intermediate representation. The reason that it is difficult to directly convert the query tree into an n-dimensional feature vector is the complexity and variability of the query tree structure. Second, we propose a method to extract the syntactic features, as well as the semantic features when generating feature vector. Third, we propose a method to transform string feature values into numeric feature values, combining multiple statistical models. The combined model maps one string value to one numeric value by containing the multiple characteristic of each string value. In order to demonstrate the feasibility of our proposals in practical environments, we implement the SQLIA detection system based on PostgreSQL, a popular open source database system, and we perform experiments. The experimental results using the internal query trees of PostgreSQL validate that our proposal is effective in detecting SQLIAs, with at least 99.6{\%} of the probability that the probability for malicious queries to be correctly predicted as SQLIA is greater than the probability for normal queries to be incorrectly predicted as SQLIA. Finally, we perform additional experiments to compare our proposal with syntax-focused feature extraction and single statistical model based on feature transformation. The experimental results show that our proposal significantly increases the probability of correctly detecting SQLIAs for various SQL statements, when compared to the previous methods.",
keywords = "Data mining, Database, Intrusion detection, SQL injection attack, SVM",
author = "Kim, {Mi Yeon} and Lee, {Dong Hoon}",
year = "2014",
month = "9",
day = "1",
doi = "10.1016/j.eswa.2014.02.041",
language = "English",
volume = "41",
pages = "5416--5430",
journal = "Expert Systems with Applications",
issn = "0957-4174",
publisher = "Elsevier Limited",
number = "11",

}

TY - JOUR

T1 - Data-mining based SQL injection attack detection using internal query trees

AU - Kim, Mi Yeon

AU - Lee, Dong Hoon

PY - 2014/9/1

Y1 - 2014/9/1

N2 - Detecting SQL injection attacks (SQLIAs) is becoming increasingly important in database-driven web sites. Until now, most of the studies on SQLIA detection have focused on the structured query language (SQL) structure at the application level. Unfortunately, this approach inevitably fails to detect those attacks that use already stored procedure and data within the database system. In this paper, we propose a framework to detect SQLIAs at database level by using SVM classification and various kernel functions. The key issue of SQLIA detection framework is how to represent the internal query tree collected from database log suitable for SVM classification algorithm in order to acquire good performance in detecting SQLIAs. To solve the issue, we first propose a novel method to convert the query tree into an n-dimensional feature vector by using a multi-dimensional sequence as an intermediate representation. The reason that it is difficult to directly convert the query tree into an n-dimensional feature vector is the complexity and variability of the query tree structure. Second, we propose a method to extract the syntactic features, as well as the semantic features when generating feature vector. Third, we propose a method to transform string feature values into numeric feature values, combining multiple statistical models. The combined model maps one string value to one numeric value by containing the multiple characteristic of each string value. In order to demonstrate the feasibility of our proposals in practical environments, we implement the SQLIA detection system based on PostgreSQL, a popular open source database system, and we perform experiments. The experimental results using the internal query trees of PostgreSQL validate that our proposal is effective in detecting SQLIAs, with at least 99.6% of the probability that the probability for malicious queries to be correctly predicted as SQLIA is greater than the probability for normal queries to be incorrectly predicted as SQLIA. Finally, we perform additional experiments to compare our proposal with syntax-focused feature extraction and single statistical model based on feature transformation. The experimental results show that our proposal significantly increases the probability of correctly detecting SQLIAs for various SQL statements, when compared to the previous methods.

AB - Detecting SQL injection attacks (SQLIAs) is becoming increasingly important in database-driven web sites. Until now, most of the studies on SQLIA detection have focused on the structured query language (SQL) structure at the application level. Unfortunately, this approach inevitably fails to detect those attacks that use already stored procedure and data within the database system. In this paper, we propose a framework to detect SQLIAs at database level by using SVM classification and various kernel functions. The key issue of SQLIA detection framework is how to represent the internal query tree collected from database log suitable for SVM classification algorithm in order to acquire good performance in detecting SQLIAs. To solve the issue, we first propose a novel method to convert the query tree into an n-dimensional feature vector by using a multi-dimensional sequence as an intermediate representation. The reason that it is difficult to directly convert the query tree into an n-dimensional feature vector is the complexity and variability of the query tree structure. Second, we propose a method to extract the syntactic features, as well as the semantic features when generating feature vector. Third, we propose a method to transform string feature values into numeric feature values, combining multiple statistical models. The combined model maps one string value to one numeric value by containing the multiple characteristic of each string value. In order to demonstrate the feasibility of our proposals in practical environments, we implement the SQLIA detection system based on PostgreSQL, a popular open source database system, and we perform experiments. The experimental results using the internal query trees of PostgreSQL validate that our proposal is effective in detecting SQLIAs, with at least 99.6% of the probability that the probability for malicious queries to be correctly predicted as SQLIA is greater than the probability for normal queries to be incorrectly predicted as SQLIA. Finally, we perform additional experiments to compare our proposal with syntax-focused feature extraction and single statistical model based on feature transformation. The experimental results show that our proposal significantly increases the probability of correctly detecting SQLIAs for various SQL statements, when compared to the previous methods.

KW - Data mining

KW - Database

KW - Intrusion detection

KW - SQL injection attack

KW - SVM

UR - http://www.scopus.com/inward/record.url?scp=84898424321&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84898424321&partnerID=8YFLogxK

U2 - 10.1016/j.eswa.2014.02.041

DO - 10.1016/j.eswa.2014.02.041

M3 - Article

AN - SCOPUS:84898424321

VL - 41

SP - 5416

EP - 5430

JO - Expert Systems with Applications

JF - Expert Systems with Applications

SN - 0957-4174

IS - 11

ER -