| Coding Forum Problems with your code? Let's hear about it. |
02-14-2008, 11:53 AM
|
#1 (permalink)
|
|
Junior Member
Join Date: 02-14-08
Location: Mount Dora, FL
Posts: 13
Latest Blog: None
|
Pulling content from search engine results...
Hello all! New to the forum, and I could use some help... I need to compile a list from search engine results. Is there any way to do this? Thanks for the help...
@
|
|
|
02-14-2008, 01:26 PM
|
#2 (permalink)
|
|
Contributing Member
Join Date: 05-18-04
Location: Florida
Posts: 966
Latest Blog: None
|
You mean like your current positions, gain loss etc.
Webposition pro
__________________
Just because you're paranoid doesn't necessarily mean people aren't out to get you
|
|
|
02-14-2008, 01:46 PM
|
#3 (permalink)
|
|
Junior Member
Join Date: 02-14-08
Location: Mount Dora, FL
Posts: 13
Latest Blog: None
|
No, more along the lines of pulling actual search results and compiling a list of those results in an easy on the eyes form. Example: searching for Bars/Night Clubs in Michigan on Yahoo Yellow Pages, compiled into a list of these without all the other clutter. I don't know if these is even possible... Thanks for any help though...
@
|
|
|
02-14-2008, 02:19 PM
|
#4 (permalink)
|
|
v7n Mentor
Join Date: 07-24-06
Posts: 642
Latest Blog: None
|
Use Seo4FireFox extension by Aaron Wall. It has ability to save search results to CSV file.
|
|
|
02-14-2008, 02:36 PM
|
#5 (permalink)
|
|
Junior Member
Join Date: 02-14-08
Location: Mount Dora, FL
Posts: 13
Latest Blog: None
|
Thanks nasty. Is there anyway to make it into an automated system? Instead of saving them for database purposes, I need them to be "live." In essence, I choose the search engine, tell it what to search for, and it fetches the results only including the info I want from the link... including all the following pages of said results. Whew... That was a mouthful. I know it's gotta be a pretty complicated process, and I can't quit scratching my head over it. Thanks again.
@
|
|
|
02-14-2008, 02:40 PM
|
#6 (permalink)
|
|
v7n Mentor
Join Date: 07-24-06
Posts: 642
Latest Blog: None
|
Scraping is against Googles TOS as I know. You (your IP) can get banned (I know this from my own experience  ). Anyway, it's pretty build a scraper with some programming skills. If you know PHP, look at curl functions.
|
|
|
02-14-2008, 02:48 PM
|
#7 (permalink)
|
|
Junior Member
Join Date: 02-14-08
Location: Mount Dora, FL
Posts: 13
Latest Blog: None
|
Thanks nasty, I'll take a look at curl functions. As far as scrapping, what if it's for private use? Nothing published, or being redistributed across the web...
@
|
|
|
02-14-2008, 03:01 PM
|
#9 (permalink)
|
|
Junior Member
Join Date: 02-14-08
Location: Mount Dora, FL
Posts: 13
Latest Blog: None
|
Well, that sucks! I guess, unless I wish to be banned from google, my little project is over. Although, after reading the link you posted, I think what I'm trying to do is different. Maybe I should just ask google for their point of view? lol Thanks for the help either way.
@
|
|
|
02-14-2008, 03:05 PM
|
#10 (permalink)
|
|
v7n Mentor
Join Date: 07-24-06
Posts: 642
Latest Blog: None
|
You can try to imitate human behavior. Don't scrape to much at a time (think about user with few browser tabs open) and make a reasonable breaks between requests.
|
|
|
02-14-2008, 03:11 PM
|
#11 (permalink)
|
|
Junior Member
Join Date: 02-14-08
Location: Mount Dora, FL
Posts: 13
Latest Blog: None
|
What about an applet that would run from my end,as apposed to server side, pulling and placing the data in a spreadsheet? Would that be considered scraping, or would that essentially be what Seo4FireFox does?
|
|
|
02-14-2008, 03:15 PM
|
#12 (permalink)
|
|
v7n Mentor
Join Date: 07-24-06
Posts: 642
Latest Blog: None
|
What's a difference between your computer and so called server? IP address? Google has many visitors form different addresses and it bans for a suspicious behavior not for technology used.
|
|
|
02-14-2008, 03:22 PM
|
#13 (permalink)
|
|
Junior Member
Join Date: 02-14-08
Location: Mount Dora, FL
Posts: 13
Latest Blog: None
|
Ahhh... I see says the blind man. That makes a lot of sense. Thanks for all the help, you've saved me months of headache now that I know it's not allowed.
@
|
|
|
02-14-2008, 03:28 PM
|
#14 (permalink)
|
|
v7n Mentor
Join Date: 07-24-06
Posts: 642
Latest Blog: None
|
As I said before, you can try to simulate human behavior. If you need small amounts of data - it will work.
|
|
|
02-14-2008, 03:33 PM
|
#15 (permalink)
|
|
Junior Member
Join Date: 02-14-08
Location: Mount Dora, FL
Posts: 13
Latest Blog: None
|
I could do it in small batches. I'll give that a try. So you think cURL's the way to go?
@
|
|
|
02-14-2008, 03:40 PM
|
#16 (permalink)
|
|
v7n Mentor
Join Date: 07-24-06
Posts: 642
Latest Blog: None
|
Quote:
Originally Posted by JoeTMuse
I could do it in small batches. I'll give that a try. So you think cURL's the way to go?
@
|
yes, see: http://php.net/manual/en/ref.curl.php
|
|
|
02-14-2008, 03:48 PM
|
#17 (permalink)
|
|
Junior Member
Join Date: 02-14-08
Location: Mount Dora, FL
Posts: 13
Latest Blog: None
|
Sweet! Thanks nasty, 'preciate it!
|
|
|
02-14-2008, 03:53 PM
|
#18 (permalink)
|
|
v7n Mentor
Join Date: 07-24-06
Posts: 642
Latest Blog: None
|
Quote:
Originally Posted by JoeTMuse
Sweet! Thanks nasty, 'preciate it!
|
My pleasure 
|
|
|
02-14-2008, 04:51 PM
|
#19 (permalink)
|
|
Junior Member
Join Date: 01-20-08
Posts: 16
|
I think there's a bit of confusion here about banning and what that word means in this context.
If you have a script on your website which is doing the scraping, there's a risk of the site being deindexed or 'banned'. Of course there are legitimate ways of doing it. Look into the Google API - they may not want you scraping their results but they do provide you with ways of using their data.
If you have a script on a separate domain which is doing the scraping and eg saving the results to a file or db for the main site to access. The domain doing the scraping may be deindexed but since that domain is just being used for the scraping and running other scripts who cares. Since there's no connection from the SEs viewpoint between the scraping domain and your site, there's no danger to your site.
If a search engine receives too many requests in a given period from the same IP address it is usual for the SE to place an automated temporary (usually 24 hours or less) ban on that IP address from making requests and receiving data - this has nothing to do with search engine indexing it's just a limit on the number of requests an IP can make.
If you are running a tool from your desktop it is your own IP that may be banned. If you're running a script on a server then it's the IP address of that domain which may be banned. Either way using proxies gets around it.
It takes a very high volume of requests to get an IP ban. Simply putting a few seconds delay between requests is usually enough to avoid it.
|
|
|
02-14-2008, 05:21 PM
|
#20 (permalink)
|
|
Junior Member
Join Date: 02-14-08
Location: Mount Dora, FL
Posts: 13
Latest Blog: None
|
Thanks for the info. That's a definite help.
@
|
|
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
|
|
|
| Thread Tools |
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -7. The time now is 01:01 PM.
© Copyright 2008 V7 Inc
|