Webmaster Forum

Advertise Here   Keyword Research Tool   V7N Directory
Go Back   Webmaster Forum > Web Development > Web Design Lobby > Coding Forum
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Coding Forum Problems with your code? Let's hear about it.

Reply
 
LinkBack Thread Tools Display Modes
Old 02-15-2008, 01:23 AM   #21 (permalink)
Contributing Member
 
Join Date: 07-24-06
Posts: 565
iTrader: 1 / 100%
Latest Blog:
None

nasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the rough
Quote:
Originally Posted by DaGrip View Post
I think there's a bit of confusion here about banning and what that word means in this context.

If you have a script on your website which is doing the scraping, there's a risk of the site being deindexed or 'banned'. Of course there are legitimate ways of doing it. Look into the Google API - they may not want you scraping their results but they do provide you with ways of using their data.
Google does not provide search API keys anymore.

Quote:
If you have a script on a separate domain which is doing the scraping and eg saving the results to a file or db for the main site to access. The domain doing the scraping may be deindexed but since that domain is just being used for the scraping and running other scripts who cares. Since there's no connection from the SEs viewpoint between the scraping domain and your site, there's no danger to your site.
Do it and Google will stop responding to your domain after several queries.

Quote:
If a search engine receives too many requests in a given period from the same IP address it is usual for the SE to place an automated temporary (usually 24 hours or less) ban on that IP address from making requests and receiving data - this has nothing to do with search engine indexing it's just a limit on the number of requests an IP can make.
Yes, but the topic was "pulling content from search engine results" not about indexation of the site.

Quote:
If you are running a tool from your desktop it is your own IP that may be banned. If you're running a script on a server then it's the IP address of that domain which may be banned. Either way using proxies gets around it.
You'll need a lot of proxies

Quote:
It takes a very high volume of requests to get an IP ban. Simply putting a few seconds delay between requests is usually enough to avoid it.
My test show that 40 queries in a row with a random 5-20 sec. delay causes a ban. Did you do your tests on Google or on any other search engine?
nasty.web is offline  
Add Post to del.icio.us
Reply With Quote
Old 02-15-2008, 01:27 AM   #22 (permalink)
Contributing Member
 
Join Date: 07-24-06
Posts: 565
iTrader: 1 / 100%
Latest Blog:
None

nasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the rough
Also quick note on the proxies - you'll need a transparent proxies because normal ones still puts a "X_FORWARDED-FOR" in the header. Thus SEs will know who original scraper is.
nasty.web is offline  
Add Post to del.icio.us
Reply With Quote
Old 02-15-2008, 08:37 AM   #23 (permalink)
Junior Member
 
Join Date: 02-14-08
Location: Mount Dora, FL
Posts: 13
iTrader: 0 / 0%
Latest Blog:
None

JoeTMuse is liked by many
Nasty... No, I still haven't built the script yet. When it's all done, I'm going to be testing it on Yahoo Yellow Pages. For testing purposes, it'll be easier to clarify the data I want.
JoeTMuse is offline  
Add Post to del.icio.us
Reply With Quote
Old 02-15-2008, 09:12 AM   #24 (permalink)
Contributing Member
 
Join Date: 07-24-06
Posts: 565
iTrader: 1 / 100%
Latest Blog:
None

nasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the rough
Quote:
Originally Posted by JoeTMuse View Post
Nasty... No, I still haven't built the script yet. When it's all done, I'm going to be testing it on Yahoo Yellow Pages. For testing purposes, it'll be easier to clarify the data I want.
I've asked DaGrip, he stated that few seconds delay is OK, so I want to know the details I do a lot of scraping so some details could be useful for me too
nasty.web is offline  
Add Post to del.icio.us
Reply With Quote
Old 02-15-2008, 09:24 AM   #25 (permalink)
Junior Member
 
Join Date: 02-14-08
Location: Mount Dora, FL
Posts: 13
iTrader: 0 / 0%
Latest Blog:
None

JoeTMuse is liked by many
I don't really know where to even begin. I know I'm going to end up with cURL's, but at this point, this is ground I've not covered yet.
JoeTMuse is offline  
Add Post to del.icio.us
Reply With Quote
Old 02-15-2008, 10:01 AM   #26 (permalink)
Contributing Member
 
Join Date: 07-24-06
Posts: 565
iTrader: 1 / 100%
Latest Blog:
None

nasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the roughnasty.web is a jewel in the rough
See the link to php documentation I gave you. There is example how to download a url. Build a search engine url's, download them and parse the text you've got.
nasty.web is offline  
Add Post to del.icio.us
Reply With Quote
Old 02-15-2008, 10:20 AM   #27 (permalink)
Junior Member
 
Join Date: 02-14-08
Location: Mount Dora, FL
Posts: 13
iTrader: 0 / 0%
Latest Blog:
None

JoeTMuse is liked by many
Cool, I think this is going to be my weekend project. I'll let you know how it turns out, probably going to be asking a boatload of questions about it too. Like I said, this is ground I've not yet covered, to be honest, not even experimented with. Luckily, I like tech manuals! lol
JoeTMuse is offline  
Add Post to del.icio.us
Reply With Quote
Old 02-16-2008, 01:05 AM   #28 (permalink)
Junior Member
 
DaGrip's Avatar
 
Join Date: 01-20-08
Posts: 16
iTrader: 0 / 0%
DaGrip is liked by many
Send a message via MSN to DaGrip Send a message via Yahoo to DaGrip
That's more of a field observation than anything you could call a quantified test. All I can say is I do a lot of different types of scraping, both desktop and serverside, on the top 3 SEs and IP banning is hardly ever an issue.

As far as I'm aware I'm not doing anything extraordinary.
DaGrip is offline  
Add Post to del.icio.us
Reply With Quote
Go Back   Webmaster Forum > Web Development > Web Design Lobby > Coding Forum

Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Search Engine Optimization Checklist:How to Get Top Rankings in Google Search Results dkgehl SEO Forum 12 02-12-2008 07:33 AM
Search Engine Ranking Results 100877jamie Google Forum 3 08-23-2004 07:56 AM
Search Engine Results sundancerz Marketing Forum 15 07-24-2004 11:21 PM


Sponsor Links
Get exposure! Get exposure! Find Scripts Web Hosting Directory Get exposure! SEO Blog


All times are GMT -7. The time now is 11:29 AM.
© Copyright 2008 V7 Inc


Search Engine Optimization by vBSEO 3.1.0 ©2007, Crawlability, Inc.