 |
|
06-04-2005, 09:01 AM
|
#1 (permalink)
|
|
v7n Mentor
Join Date: 10-13-03
Location: Central Ohio (Dublin)
Posts: 1,519
Latest Blog: None
|
Robots.txt question
It's been a while since I've had to write one, but if I put Disallow: ?o= it disallows everything that starts with ?o= correct?
|
|
|
05-30-2006, 05:43 AM
|
#3 (permalink)
|
|
Individualist
Join Date: 09-27-03
Location: Japan, mostly
Posts: 42,521
|
I have never done a robots.txt for a subdomain. Am I correct is assuming that the subdomain gets its own robots.txt?
Say I wanted to exclude http://directory.v7n.com/cgi-bin/
How would I write that up?
|
|
|
05-30-2006, 06:03 AM
|
#4 (permalink)
|
|
v7n Mentor
Join Date: 01-12-04
Location: Gatineau, QC, Canada
Posts: 6,219
|
What's exactly the purpose of a robots.txt? I have an idea of what it do but I'm not grasping the bigger picture.
What would/could it bring me on my domain for example?
|
|
|
05-30-2006, 06:15 AM
|
#5 (permalink)
|
|
Individualist
Join Date: 09-27-03
Location: Japan, mostly
Posts: 42,521
|
The robotx.txt file is the first file a spider visits, and you can tell the spider where to not go, etc.
|
|
|
05-30-2006, 06:20 AM
|
#6 (permalink)
|
|
v7n Mentor
Join Date: 01-12-04
Location: Gatineau, QC, Canada
Posts: 6,219
|
Yes, but what can it bring me?
|
|
|
05-30-2006, 06:21 AM
|
#7 (permalink)
|
|
Individualist
Join Date: 09-27-03
Location: Japan, mostly
Posts: 42,521
|
Fame and riches?
|
|
|
05-30-2006, 06:25 AM
|
#8 (permalink)
|
|
v7n Mentor
Join Date: 01-12-04
Location: Gatineau, QC, Canada
Posts: 6,219
|
Hmmm, Is that how you did it John? 
|
|
|
05-30-2006, 06:28 AM
|
#9 (permalink)
|
|
Individualist
Join Date: 09-27-03
Location: Japan, mostly
Posts: 42,521
|
LOL.
I wish it were that easy. 
|
|
|
05-30-2006, 06:34 AM
|
#10 (permalink)
|
|
aka Colleen
Join Date: 03-25-04
Location: Canada
Posts: 5,925
Latest Blog: None
|
I think John just finds a rich girlfriend. 
|
|
|
05-30-2006, 07:06 AM
|
#11 (permalink)
|
|
v7n Mentor
Join Date: 12-04-05
Location: UK
Posts: 845
|
Quote:
I think John just finds a rich girlfriend.
|
where? / Who? / What? / When?
--
Is there away to know if robots.txt works? because i use for most of my websites.
|
|
|
05-30-2006, 07:33 AM
|
#12 (permalink)
|
|
Individualist
Join Date: 09-27-03
Location: Japan, mostly
Posts: 42,521
|
Quote:
|
Originally Posted by Colleen
I think John just finds a rich girlfriend. 
|
LOL. I wish. *Dreaming*
|
|
|
05-30-2006, 07:52 PM
|
#13 (permalink)
|
|
Contributing Member
Join Date: 05-30-06
Location: Canada
Posts: 466
|
Hi all,
JohnScott, I think that you have to have a robots.txt file in the document root of virtual domain in this case: directory.
e.g.:
in /home/v7n/www/directory/robots.txt
User-agent: *
Disallow: /cgi-bin/
Svet
|
|
|
05-30-2006, 08:01 PM
|
#14 (permalink)
|
|
v7n Mentor
Join Date: 10-16-03
Location: USA
Posts: 1,559
Latest Blog: None
|
I'd also like to block a se and it's been awhile since I've set one up.
How would I block google from a subdirectory?
|
|
|
05-30-2006, 09:25 PM
|
#15 (permalink)
|
|
Inactive
Join Date: 05-28-06
Location: Canada
Posts: 21
Latest Blog: None
|
Quote:
|
How would I block google from a subdirectory?
|
the Google spider uses the user agent id of Googlebot. It will comply to instructions in teh robots.txt if you follow the stadards for robot exclusion:
User-agent: Googlebot
Disallow: /your directory
Detailed control instruction for Google spiders are here:
http://www.google.ca/support/webmast...y?answer=35303
The Google spiders are well behaved if the site does not have a lot of crap and script generated links. AFAIK Google bot is the only one that will obey rel="nofollow" on a link and that gives you very detailed control of what they index.
Cd&
|
|
|
05-31-2006, 12:37 AM
|
#16 (permalink)
|
|
Individualist
Join Date: 09-27-03
Location: Japan, mostly
Posts: 42,521
|
Quote:
|
Originally Posted by lordspace
Hi all,
JohnScott, I think that you have to have a robots.txt file in the document root of virtual domain in this case: directory.
e.g.:
in /home/v7n/www/directory/robots.txt
User-agent: *
Disallow: /cgi-bin/
Svet
|
Thanks Svet 
|
|
|
05-31-2006, 12:51 AM
|
#17 (permalink)
|
|
Contributing Member
Join Date: 03-22-06
Location: Costa Rica
Posts: 365
Latest Blog: None
|
Even though googlebots behave most of the time.. I've had a page listed even though I have excluded it in the robots.txt file.(There since website inception). Here's the case:
Exclusion: >>http://www.paidx.com/robots.txt<<
Webpage listed: >>http://www.paidx.com/affiliate/index.asp<<
It even has a PR2 ??
I have only 1 inbound link to that page from a PR3 page.
So does that mean that inbound links force robots to ignore exclusions in robots.txt??
|
|
|
05-31-2006, 07:15 AM
|
#18 (permalink)
|
|
Contributing Member
Join Date: 05-30-06
Location: Canada
Posts: 466
|
Hi WagerX,
Quote:
|
So does that mean that inbound links force robots to ignore exclusions in robots.txt??
|
It seems is possible to have you PR but this page is not shown in results.
if you search for your domain in google only main page is shown, so r | |