Fractional pagerank crawler: Prioritizing URLs efficiently for crawling important pages early

Md Hijbul Alam, JongWoo Ha, Sang-Geun Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

Crawling important pages early is a well studied problem. However, the availability of di.erent types of framework for publishing web content greatly increases the number of web pages. Therefore, the crawler should be fast enough to prioritize and download the important pages. As the importance of a page is not known before or during its download, the crawler needs a great deal of time to approximate the importance to prioritize the download of the web pages. In this research, we propose Fractional PageRank crawlers that prioritize the downloaded pages for the purpose of discovering important URLs early during the crawl. Our experiments demonstrate that they improve the running time dramatically while crawling the important pages early.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Pages590-594
Number of pages5
Volume5463
DOIs
Publication statusPublished - 2009 Jul 15
Event14th International Conference on Database Systems for Advanced Applications, DASFAA 2009 - Brisbane, QLD, Australia
Duration: 2009 Apr 212009 Apr 23

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5463
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other14th International Conference on Database Systems for Advanced Applications, DASFAA 2009
CountryAustralia
CityBrisbane, QLD
Period09/4/2109/4/23

Fingerprint

PageRank
Websites
Fractional
World Wide Web
Availability
Demonstrate
Experiment
Web crawler
Experiments

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Alam, M. H., Ha, J., & Lee, S-G. (2009). Fractional pagerank crawler: Prioritizing URLs efficiently for crawling important pages early. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5463, pp. 590-594). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5463). https://doi.org/10.1007/978-3-642-00887-0_52

Fractional pagerank crawler : Prioritizing URLs efficiently for crawling important pages early. / Alam, Md Hijbul; Ha, JongWoo; Lee, Sang-Geun.

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5463 2009. p. 590-594 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5463).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Alam, MH, Ha, J & Lee, S-G 2009, Fractional pagerank crawler: Prioritizing URLs efficiently for crawling important pages early. in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). vol. 5463, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5463, pp. 590-594, 14th International Conference on Database Systems for Advanced Applications, DASFAA 2009, Brisbane, QLD, Australia, 09/4/21. https://doi.org/10.1007/978-3-642-00887-0_52
Alam MH, Ha J, Lee S-G. Fractional pagerank crawler: Prioritizing URLs efficiently for crawling important pages early. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5463. 2009. p. 590-594. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-642-00887-0_52
Alam, Md Hijbul ; Ha, JongWoo ; Lee, Sang-Geun. / Fractional pagerank crawler : Prioritizing URLs efficiently for crawling important pages early. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Vol. 5463 2009. pp. 590-594 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{4b9820078877434997caa13374112044,
title = "Fractional pagerank crawler: Prioritizing URLs efficiently for crawling important pages early",
abstract = "Crawling important pages early is a well studied problem. However, the availability of di.erent types of framework for publishing web content greatly increases the number of web pages. Therefore, the crawler should be fast enough to prioritize and download the important pages. As the importance of a page is not known before or during its download, the crawler needs a great deal of time to approximate the importance to prioritize the download of the web pages. In this research, we propose Fractional PageRank crawlers that prioritize the downloaded pages for the purpose of discovering important URLs early during the crawl. Our experiments demonstrate that they improve the running time dramatically while crawling the important pages early.",
author = "Alam, {Md Hijbul} and JongWoo Ha and Sang-Geun Lee",
year = "2009",
month = "7",
day = "15",
doi = "10.1007/978-3-642-00887-0_52",
language = "English",
isbn = "9783642008863",
volume = "5463",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
pages = "590--594",
booktitle = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

}

TY - GEN

T1 - Fractional pagerank crawler

T2 - Prioritizing URLs efficiently for crawling important pages early

AU - Alam, Md Hijbul

AU - Ha, JongWoo

AU - Lee, Sang-Geun

PY - 2009/7/15

Y1 - 2009/7/15

N2 - Crawling important pages early is a well studied problem. However, the availability of di.erent types of framework for publishing web content greatly increases the number of web pages. Therefore, the crawler should be fast enough to prioritize and download the important pages. As the importance of a page is not known before or during its download, the crawler needs a great deal of time to approximate the importance to prioritize the download of the web pages. In this research, we propose Fractional PageRank crawlers that prioritize the downloaded pages for the purpose of discovering important URLs early during the crawl. Our experiments demonstrate that they improve the running time dramatically while crawling the important pages early.

AB - Crawling important pages early is a well studied problem. However, the availability of di.erent types of framework for publishing web content greatly increases the number of web pages. Therefore, the crawler should be fast enough to prioritize and download the important pages. As the importance of a page is not known before or during its download, the crawler needs a great deal of time to approximate the importance to prioritize the download of the web pages. In this research, we propose Fractional PageRank crawlers that prioritize the downloaded pages for the purpose of discovering important URLs early during the crawl. Our experiments demonstrate that they improve the running time dramatically while crawling the important pages early.

UR - http://www.scopus.com/inward/record.url?scp=67650099403&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67650099403&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-00887-0_52

DO - 10.1007/978-3-642-00887-0_52

M3 - Conference contribution

AN - SCOPUS:67650099403

SN - 9783642008863

VL - 5463

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 590

EP - 594

BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

ER -