Show simple item record

dc.contributor.authorDikaiakos, Marios D.en
dc.contributor.authorStassopoulou, Athenaen
dc.contributor.authorPapageorgiou, Loizosen
dc.creatorDikaiakos, Marios D.en
dc.creatorStassopoulou, Athenaen
dc.creatorPapageorgiou, Loizosen
dc.description.abstractIn this paper, we present a characterization study of search-engine crawlers. For the purposes of our work, we use Web-server access logs from five academic sites in three different countries. Based on these logs, we analyze the activity of different crawlers that belong to five search engines: Google, AltaVista, Inktomi, FastSearch and CiteSeer. We compare crawler behavior to the characteristics of the general World-Wide Web traffic and to general characterization studies. We analyze crawler requests to derive insights into the behavior and strategy of crawlers. We propose a set of simple metrics that describe qualitative characteristics of crawler behavior, vis-à-vis a crawler's preference on resources of a particular format, its frequency of visits on a Web site, and the pervasiveness of its visits to a particular site. To the best of our knowledge, this is the first extensive and in depth characterization of search-engine crawlers. Our results and observations provide useful insights into crawler behavior and serve as basis of our ongoing work on the automatic detection of Web crawlers. © 2005 Elsevier B.V. All rights reserved.en
dc.sourceComputer Communicationsen
dc.subjectWorld Wide Weben
dc.subjectSearch enginesen
dc.subjectResource allocationen
dc.subjectJava programming languageen
dc.subjectWeb crawlersen
dc.subjectLocal area networksen
dc.subjectWeb characterizationen
dc.subjectWeb serversen
dc.subjectWeb trafficen
dc.titleAn investigation of web crawler behavior: Characterization and metricsen
dc.description.endingpage897 Σχολή Θετικών και Εφαρμοσμένων Επιστημών / Faculty of Pure and Applied SciencesΤμήμα Πληροφορικής / Department of Computer Science
dc.description.notes<p>Cited By :45</p>en
dc.contributor.orcidDikaiakos, Marios D. [0000-0002-4350-6058]

Files in this item


There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record