Characterizing crawler behavior from web server access logs
Date
2003ISSN
0302-9743Source
4th International Conference on E-Commerce and Web Technology, EC-Web 2003Volume
2738Pages
369-378Google Scholar check
Keyword(s):
Metadata
Show full item recordAbstract
In this paper, we present a study of crawler behavior based on Web-server access logs. To this end, we use logs from five different academic sites in three countries. Based on these logs, we analyze the activity of different crawlers that belong to five Search Engines: Google, AltaVista, Inktomi, FastSearch and CiteSeer. We compare crawler behavior to the characteristics of the general World-Wide Web traffic, and to general characterization studies based on Web-server access logs. We analyze crawler requests to derive insights into the behavior and strategy of crawlers. Our results and observations provide useful insights into crawler behavior and serve as basis of our ongoing work on the automatic detection of WWW robots. © Springer-Verlag Berlin Heidelberg 2003.