Quantcast
Channel: web-hosting – Ringing Liberty
Viewing all articles
Browse latest Browse all 25

IP address of spiders and “official” web bots

$
0
0

Quintin Par asked:

Is there an official API to iplists.com from where I can get the list of spiders?

My intention is to whitelist these IPs for site scraping.

My answer:


There’s no list of IP addresses for “good” search engine bots that I know of, and if there were it would be horribly out of date pretty quickly, as you’ve already discovered.

One thing you can do is to create a bot trap. This is simple in theory: You create a page that is linked to in your web site but hidden from normal users (e.g. via CSS tricks) and then Disallow it in robots.txt. You then wait a week since legitimate search engines may cache robots.txt for that long, then start banning anything that hits the trap page (e.g. with fail2ban).


View the full question and any other answers on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

The post IP address of spiders and “official” web bots appeared first on Ringing Liberty.


Viewing all articles
Browse latest Browse all 25

Trending Articles