Abstract
HTML is a widely used markup language to make up innumerable web pages. Parallelization of a HTML parser would lead to consequential performance improvement and a better user experience. However, parallelizing the HTML parser is challenging because of a strong cyclic dependence in the parser model. In this paper, we propose a semantic-based HTML parallel parser design that splits the input HTML document by a 'div' tag, and processes the independent partial inputs with multiple parser threads. We evaluated the proposed HTML parallel parser with the benchmarks selected from top 500 web pages and achieved a maximum speedup of 1.49x.
Original language | English |
---|---|
Title of host publication | International Conference on Electronics, Information, and Communications, ICEIC 2016 |
Publisher | Institute of Electrical and Electronics Engineers Inc. |
ISBN (Electronic) | 9781467380164 |
DOIs | |
Publication status | Published - 2016 Sep 7 |
Event | 15th International Conference on Electronics, Information, and Communications, ICEIC 2016 - Danang, Viet Nam Duration: 2016 Jan 27 → 2016 Jan 30 |
Other
Other | 15th International Conference on Electronics, Information, and Communications, ICEIC 2016 |
---|---|
Country | Viet Nam |
City | Danang |
Period | 16/1/27 → 16/1/30 |
Fingerprint
Keywords
- HTML
- multithread
- parallelizing
ASJC Scopus subject areas
- Electrical and Electronic Engineering
- Control and Systems Engineering
Cite this
Design of HTML parallel parser with semantic-based input splitting. / Lee, Jihyun; Na, Yeoul; Kim, Seon Wook.
International Conference on Electronics, Information, and Communications, ICEIC 2016. Institute of Electrical and Electronics Engineers Inc., 2016. 7563004.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution
}
TY - GEN
T1 - Design of HTML parallel parser with semantic-based input splitting
AU - Lee, Jihyun
AU - Na, Yeoul
AU - Kim, Seon Wook
PY - 2016/9/7
Y1 - 2016/9/7
N2 - HTML is a widely used markup language to make up innumerable web pages. Parallelization of a HTML parser would lead to consequential performance improvement and a better user experience. However, parallelizing the HTML parser is challenging because of a strong cyclic dependence in the parser model. In this paper, we propose a semantic-based HTML parallel parser design that splits the input HTML document by a 'div' tag, and processes the independent partial inputs with multiple parser threads. We evaluated the proposed HTML parallel parser with the benchmarks selected from top 500 web pages and achieved a maximum speedup of 1.49x.
AB - HTML is a widely used markup language to make up innumerable web pages. Parallelization of a HTML parser would lead to consequential performance improvement and a better user experience. However, parallelizing the HTML parser is challenging because of a strong cyclic dependence in the parser model. In this paper, we propose a semantic-based HTML parallel parser design that splits the input HTML document by a 'div' tag, and processes the independent partial inputs with multiple parser threads. We evaluated the proposed HTML parallel parser with the benchmarks selected from top 500 web pages and achieved a maximum speedup of 1.49x.
KW - HTML
KW - multithread
KW - parallelizing
UR - http://www.scopus.com/inward/record.url?scp=84988884713&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84988884713&partnerID=8YFLogxK
U2 - 10.1109/ELINFOCOM.2016.7563004
DO - 10.1109/ELINFOCOM.2016.7563004
M3 - Conference contribution
AN - SCOPUS:84988884713
BT - International Conference on Electronics, Information, and Communications, ICEIC 2016
PB - Institute of Electrical and Electronics Engineers Inc.
ER -