On the efficacy of per-relation basis performance evaluation for PPI extraction and a high-precision rule-based approach

Junkyu Lee, Seongsoon Kim, Sunwon Lee, Kyubum Lee, Jaewoo Kang

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Background: Most previous Protein Protein Interaction (PPI) studies evaluated their algorithms' performance based on «per-instance» precision and recall, in which the instances of an interaction relation were evaluated independently. However, we argue that this standard evaluation method should be revisited. In a large corpus, the same relation can be described in various different forms and, in practice, correctly identifying not all but a small subset of them would often suffice to detect the given interaction. Methods. In this regard, we propose a more pragmatic «per-relation» basis performance evaluation method instead of the conventional per-instance basis method. In the per-relation basis method, only a subset of a relation's instances needs to be correctly identified to make the relation positive. In this work, we also introduce a new high-precision rule-based PPI extraction algorithm. While virtually all current PPI extraction studies focus on improving F-score, aiming to balance the performance on both precision and recall, in many realistic scenarios involving large corpora, one can benefit more from a high-precision algorithm than a high-recall counterpart. Results: We show that our algorithm not only achieves better per-relation performance than previous solutions but also serves as a good complement to the existing PPI extraction tools. Our algorithm improves the performance of the existing tools through simple pipelining. Conclusion: The significance of this research can be found in that this research brought new perspective to the performance evaluation of PPI extraction studies, which we believe is more important in practice than existing evaluation criteria. Given the new evaluation perspective, we also showed the importance of a high-precision extraction tool and validated the efficacy of our rule-based system as the high-precision tool candidate.

Original languageEnglish
Article numberS7
JournalBMC Medical Informatics and Decision Making
Volume13
Issue numberSUPPL1
DOIs
Publication statusPublished - 2013 Apr 12

Fingerprint

Proteins
Research

ASJC Scopus subject areas

  • Health Informatics
  • Health Policy

Cite this

On the efficacy of per-relation basis performance evaluation for PPI extraction and a high-precision rule-based approach. / Lee, Junkyu; Kim, Seongsoon; Lee, Sunwon; Lee, Kyubum; Kang, Jaewoo.

In: BMC Medical Informatics and Decision Making, Vol. 13, No. SUPPL1, S7, 12.04.2013.

Research output: Contribution to journalArticle

@article{24950483be0e409cbd884b938caafa7c,
title = "On the efficacy of per-relation basis performance evaluation for PPI extraction and a high-precision rule-based approach",
abstract = "Background: Most previous Protein Protein Interaction (PPI) studies evaluated their algorithms' performance based on «per-instance» precision and recall, in which the instances of an interaction relation were evaluated independently. However, we argue that this standard evaluation method should be revisited. In a large corpus, the same relation can be described in various different forms and, in practice, correctly identifying not all but a small subset of them would often suffice to detect the given interaction. Methods. In this regard, we propose a more pragmatic «per-relation» basis performance evaluation method instead of the conventional per-instance basis method. In the per-relation basis method, only a subset of a relation's instances needs to be correctly identified to make the relation positive. In this work, we also introduce a new high-precision rule-based PPI extraction algorithm. While virtually all current PPI extraction studies focus on improving F-score, aiming to balance the performance on both precision and recall, in many realistic scenarios involving large corpora, one can benefit more from a high-precision algorithm than a high-recall counterpart. Results: We show that our algorithm not only achieves better per-relation performance than previous solutions but also serves as a good complement to the existing PPI extraction tools. Our algorithm improves the performance of the existing tools through simple pipelining. Conclusion: The significance of this research can be found in that this research brought new perspective to the performance evaluation of PPI extraction studies, which we believe is more important in practice than existing evaluation criteria. Given the new evaluation perspective, we also showed the importance of a high-precision extraction tool and validated the efficacy of our rule-based system as the high-precision tool candidate.",
author = "Junkyu Lee and Seongsoon Kim and Sunwon Lee and Kyubum Lee and Jaewoo Kang",
year = "2013",
month = "4",
day = "12",
doi = "10.1186/1472-6947-13-S1-S7",
language = "English",
volume = "13",
journal = "BMC Medical Informatics and Decision Making",
issn = "1472-6947",
publisher = "BioMed Central",
number = "SUPPL1",

}

TY - JOUR

T1 - On the efficacy of per-relation basis performance evaluation for PPI extraction and a high-precision rule-based approach

AU - Lee, Junkyu

AU - Kim, Seongsoon

AU - Lee, Sunwon

AU - Lee, Kyubum

AU - Kang, Jaewoo

PY - 2013/4/12

Y1 - 2013/4/12

N2 - Background: Most previous Protein Protein Interaction (PPI) studies evaluated their algorithms' performance based on «per-instance» precision and recall, in which the instances of an interaction relation were evaluated independently. However, we argue that this standard evaluation method should be revisited. In a large corpus, the same relation can be described in various different forms and, in practice, correctly identifying not all but a small subset of them would often suffice to detect the given interaction. Methods. In this regard, we propose a more pragmatic «per-relation» basis performance evaluation method instead of the conventional per-instance basis method. In the per-relation basis method, only a subset of a relation's instances needs to be correctly identified to make the relation positive. In this work, we also introduce a new high-precision rule-based PPI extraction algorithm. While virtually all current PPI extraction studies focus on improving F-score, aiming to balance the performance on both precision and recall, in many realistic scenarios involving large corpora, one can benefit more from a high-precision algorithm than a high-recall counterpart. Results: We show that our algorithm not only achieves better per-relation performance than previous solutions but also serves as a good complement to the existing PPI extraction tools. Our algorithm improves the performance of the existing tools through simple pipelining. Conclusion: The significance of this research can be found in that this research brought new perspective to the performance evaluation of PPI extraction studies, which we believe is more important in practice than existing evaluation criteria. Given the new evaluation perspective, we also showed the importance of a high-precision extraction tool and validated the efficacy of our rule-based system as the high-precision tool candidate.

AB - Background: Most previous Protein Protein Interaction (PPI) studies evaluated their algorithms' performance based on «per-instance» precision and recall, in which the instances of an interaction relation were evaluated independently. However, we argue that this standard evaluation method should be revisited. In a large corpus, the same relation can be described in various different forms and, in practice, correctly identifying not all but a small subset of them would often suffice to detect the given interaction. Methods. In this regard, we propose a more pragmatic «per-relation» basis performance evaluation method instead of the conventional per-instance basis method. In the per-relation basis method, only a subset of a relation's instances needs to be correctly identified to make the relation positive. In this work, we also introduce a new high-precision rule-based PPI extraction algorithm. While virtually all current PPI extraction studies focus on improving F-score, aiming to balance the performance on both precision and recall, in many realistic scenarios involving large corpora, one can benefit more from a high-precision algorithm than a high-recall counterpart. Results: We show that our algorithm not only achieves better per-relation performance than previous solutions but also serves as a good complement to the existing PPI extraction tools. Our algorithm improves the performance of the existing tools through simple pipelining. Conclusion: The significance of this research can be found in that this research brought new perspective to the performance evaluation of PPI extraction studies, which we believe is more important in practice than existing evaluation criteria. Given the new evaluation perspective, we also showed the importance of a high-precision extraction tool and validated the efficacy of our rule-based system as the high-precision tool candidate.

UR - http://www.scopus.com/inward/record.url?scp=84875921402&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84875921402&partnerID=8YFLogxK

U2 - 10.1186/1472-6947-13-S1-S7

DO - 10.1186/1472-6947-13-S1-S7

M3 - Article

C2 - 23566263

AN - SCOPUS:84875921402

VL - 13

JO - BMC Medical Informatics and Decision Making

JF - BMC Medical Informatics and Decision Making

SN - 1472-6947

IS - SUPPL1

M1 - S7

ER -