Listing of web crawlers that do not support compression

If you are the author of any of these spiders, then please add support for content compression when you crawl the web. This will save you bandwidth on your crawling system, and it saves bandwidth on the servers that you crawl.

Adding compression support can be very simple -- if your spider is coded in Perl using LWP::UserAgent, then the addition of a single line of code will enable compression support.

$ua->default_header('Accept-Encoding' => 'gzip');
and then you need to make sure that you always refer to 'decoded_content' when dealing with the response object.

For other languages, all you need to do is to add

Accept-encoding: gzip
to the HTTP request that you send, and then be prepared to deal with a 'content-encoding: gzip' in the response.

Happily, some of the large spiders do support compression -- the googlebot and Yahoo Slurp do (to name but two). Since I started prodding crawler implementors, a couple have implemented compression (one within hours), and another reported that it was a bug that it didn't work -- which would be fixed shortly.

Crawlers which do more than 5% of the total (uncompressed) crawling activity are marked in bold below.

CrawlerLast IP used
Aranea Web-Crawled Corpora Project (+http://aranea.juls.savba.sk/guest (English 2024 Spring Crawl))" "blog.gladstonefamily.net147.213.138.57
Aranea Web-Crawled Corpora Project (+http://aranea.juls.savba.sk/guest (English 2024 Spring Crawl))" "pond1.gladstonefamily.net147.213.138.57
curl/7.54.0" "c-73-227-75-114.hsd1.ma.comcast.net139.144.52.241
curl/7.54.0" "c-73-227-75-114.hsd1.ma.comcast.net:8080139.144.52.241
DomainStatsBot/1.0 (https://domainstats.com/pages/our-bot)" "gladstonefamily.net148.251.121.91
Magellan" "gladstonefamily.net172.91.101.96
masscan/1.0 (https://github.com/robertdavidgraham/masscan)" "-87.98.241.210
Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help@moz.com)" "blog1.gladstonefamily.net216.244.66.194
Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help@moz.com)" "gladstonefamily.net216.244.66.194
Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help@moz.com)" "pond.gladstonefamily.net216.244.66.194
Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help@moz.com)" "www.gladstonefamily.net216.244.66.194
Mozilla/5.0 (compatible; SeekportBot; +https://bot.seekport.com)" "pond.gladstonefamily.net65.108.74.120
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.134 Safari/537.36138.246.253.24
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" "blog.gladstonefamily.net3.143.244.83
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" "blog1.gladstonefamily.net3.17.174.239
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" "gladstonefamily.net18.219.236.62
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" "pond.gladstonefamily.net3.138.204.208
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" "pond1.gladstonefamily.net3.143.9.115
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)" "www.gladstonefamily.net18.116.36.192
VsuSearchSpider/1.087.153.109.113

Comments, problems etc to
Philip Gladstone

Last modified Sunday, 19 November 2006