IPMicra: An IP-address based location aware distributed web crawler
Samaras, George S.
SourceProceedings of the International Conference on Internet Computing, IC'04
Proceedings of the International Conference on Internet Computing, IC'04
Google Scholar check
MetadataShow full item record
Distributed crawling is able to overcome Important limitations of the traditional single-sourced web crawling systems. However, the optimal benefit of distributed crawling is usually limited to the sites hosting the crawlers, the rest of the URLs are by large randomly distributed to the various crawlers. In this work, we propose a location-aware method, called IPMicra, that utilizes an IP address hierarchy, and allows crawling of links in a near optimal location aware manner. Our proposal outperforms earlier distributed crawling schemes by requiring one order of magnitude less time for crawling of the same set of sites.