Quote:
Originally Posted by DaGrip
I think there's a bit of confusion here about banning and what that word means in this context.
If you have a script on your website which is doing the scraping, there's a risk of the site being deindexed or 'banned'. Of course there are legitimate ways of doing it. Look into the Google API - they may not want you scraping their results but they do provide you with ways of using their data.
|
Google does not provide search API keys anymore.
Quote:
|
If you have a script on a separate domain which is doing the scraping and eg saving the results to a file or db for the main site to access. The domain doing the scraping may be deindexed but since that domain is just being used for the scraping and running other scripts who cares. Since there's no connection from the SEs viewpoint between the scraping domain and your site, there's no danger to your site.
|
Do it and Google will stop responding to your domain after several queries.
Quote:
|
If a search engine receives too many requests in a given period from the same IP address it is usual for the SE to place an automated temporary (usually 24 hours or less) ban on that IP address from making requests and receiving data - this has nothing to do with search engine indexing it's just a limit on the number of requests an IP can make.
|
Yes, but the topic was "pulling content from search engine results" not about indexation of the site.
Quote:
|
If you are running a tool from your desktop it is your own IP that may be banned. If you're running a script on a server then it's the IP address of that domain which may be banned. Either way using proxies gets around it.
|
You'll need a lot of proxies
Quote:
|
It takes a very high volume of requests to get an IP ban. Simply putting a few seconds delay between requests is usually enough to avoid it.
|
My test show that 40 queries in a row with a random 5-20 sec. delay causes a ban. Did you do your tests on Google or on any other search engine?