Webmaster Forum


Go Back   Webmaster Forum > Marketing Forums > SEO Forum
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

SEO Forum Search engine optimization discussions.

   

Reply
 
LinkBack Thread Tools Display Modes
Old 05-24-2007, 11:32 PM   #1 (permalink)
Contributing Member
 
mitra's Avatar
 
Join Date: 12-23-06
Location: India
Posts: 83
iTrader: 0 / 0%
mitra is liked by many
Send a message via Yahoo to mitra
Need help with robots.txt

I've my robots.txt created and I need to stop some bad urls from getting indexed in google. I'm not able to find from where google is finding those urls. Those pages are on the sub domain. The urls are something like 'http://xxx.mysite.com/1/2/3/4.php' .

Suppose I want to block /1/ directory and all its content, how should I block them through robots.txt?

Thanks
mitra
mitra is offline  
Add Post to del.icio.us
Reply With Quote
Old 05-24-2007, 11:44 PM   #2 (permalink)
v7n Mentor
 
Costin Trifan's Avatar
 
Join Date: 04-13-07
Location: Romania
Posts: 3,009
iTrader: 0 / 0%
Costin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web pro
User-Agent: *
Disallow: /1/

this will disallow access for all bots to that directory.
Costin Trifan is offline  
Add Post to del.icio.us
Reply With Quote
Old 05-24-2007, 11:47 PM   #3 (permalink)
v7n Mentor
 
Costin Trifan's Avatar
 
Join Date: 04-13-07
Location: Romania
Posts: 3,009
iTrader: 0 / 0%
Costin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web pro
for a detailed view, see my robots.txt file: http://optimizaremaster.org/robots.txt
Costin Trifan is offline  
Add Post to del.icio.us
Reply With Quote
Old 05-25-2007, 12:13 AM   #4 (permalink)
Contributing Member
 
mitra's Avatar
 
Join Date: 12-23-06
Location: India
Posts: 83
iTrader: 0 / 0%
mitra is liked by many
Send a message via Yahoo to mitra
Thanks costin

But I think I need another robots.txt for sub domain. where should I place it?

One more query, I've my google sitemap account for the main site. Do I need to add another sitemap for my sub.domain? Because what I find that I won't be getting the url removal option (new feature in google sitemap) for the sub domain?

mitra
mitra is offline  
Add Post to del.icio.us
Reply With Quote
Old 05-25-2007, 12:49 AM   #5 (permalink)
v7n Mentor
 
Costin Trifan's Avatar
 
Join Date: 04-13-07
Location: Romania
Posts: 3,009
iTrader: 0 / 0%
Costin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web pro
Quote:
But I think I need another robots.txt for sub domain. where should I place it?
You don't need a new robots.txt file for your sub-domain.

Look at my robots.txt file and see there

Quote:
#
# FOLDERS
#
User-agent: *
Disallow: /App_Data/
User-agent: *
Disallow: /App_Code/
User-agent: *
Disallow: /costin/
Where costin is my sub-domain.

See my point here?
Costin Trifan is offline  
Add Post to del.icio.us
Reply With Quote
Old 05-25-2007, 01:14 AM   #6 (permalink)
Contributing Member
 
mitra's Avatar
 
Join Date: 12-23-06
Location: India
Posts: 83
iTrader: 0 / 0%
mitra is liked by many
Send a message via Yahoo to mitra
I'm going thru this http://www.webmasterworld.com/forum93/628.htm and found that we need separate robots.txt for each.
mitra is offline  
Add Post to del.icio.us
Reply With Quote
Old 05-25-2007, 01:43 AM   #7 (permalink)
v7n Mentor
 
Costin Trifan's Avatar
 
Join Date: 04-13-07
Location: Romania
Posts: 3,009
iTrader: 0 / 0%
Costin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web pro
...wherever...

IMHO: I'm sure you don't need two different robots.txt files

A subdomain is part of the main domain (that is the subdomain is considered to be just a folder for the main domain) and it can be blocked from the main robots.txt.
Costin Trifan is offline  
Add Post to del.icio.us
Reply With Quote
Old 05-25-2007, 02:48 AM   #8 (permalink)
Contributing Member
 
mitra's Avatar
 
Join Date: 12-23-06
Location: India
Posts: 83
iTrader: 0 / 0%
mitra is liked by many
Send a message via Yahoo to mitra
ok let me check once again with robots.txt analysis in google sitemap if it works.

I have the sub domain like 'http://xxx.mysite.com' and I'm putting /1/
mitra is offline  
Add Post to del.icio.us
Reply With Quote
Old 05-26-2007, 11:16 AM   #9 (permalink)
Contributing Member
 
mitra's Avatar
 
Join Date: 12-23-06
Location: India
Posts: 83
iTrader: 0 / 0%
mitra is liked by many
Send a message via Yahoo to mitra
I've checked with the google site robots.txt analysis tool and found that it is not working for me.

Here is the url I tested

http://subdomain.mysite.com/sys/

I've put Disallow: /sys/

Could you please throw more light on it?
mitra is offline  
Add Post to del.icio.us
Reply With Quote
Old 05-28-2007, 10:19 AM   #10 (permalink)
Contributing Member
 
mitra's Avatar
 
Join Date: 12-23-06
Location: India
Posts: 83
iTrader: 0 / 0%
mitra is liked by many
Send a message via Yahoo to mitra
I don't find any info on how to deal with the sub domain in google webmaster tool. can anybody pls help me?
mitra is offline  
Add Post to del.icio.us
Reply With Quote
Old 05-28-2007, 04:28 PM   #11 (permalink)
v7n Mentor
 
Costin Trifan's Avatar
 
Join Date: 04-13-07
Location: Romania
Posts: 3,009
iTrader: 0 / 0%
Costin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web proCostin Trifan is a highly respected web pro
Thumbs down

Well, Mitra, I really don't know an easy way to say it but I'll try to explain it as better as I can.
So,
Let's say you have this web site called "MyWebSite" and the url will be, of course "www.MyWebSite.com". Now, in this web site you create two subdomains: one called "MySubdomainOne" and other called "MySubdomainTwo"
.
Your web site structure should look like this:
MyWebSite (this is the root folder)
> index.php
> page1.html
> page2.html
> robots.txt
> MySubdomainOne (this is a folder, but also the root folder for the subdomain 1 and contains the following files)
-- index.php
-- page1.html
-- page2.html
> MySubdomainTwo (this is another folder, but also the root folder for the subdomain 2 and contains the following files)
-- index.php
-- page1.html
-- page2.html

Now, as you said, you don't want the SE's robots to access your subdomain's files, so your robots.txt file should look like this:

# This will allow any spider to access all the files within your domain:
User-Agent: *
Disallow:


# This will block all spiders to access any file within this subdomain
User-Agent: *
Disallow: /MySubdomainOne/


# This will block all spiders to access any file within this subdomain
User-Agent: *
Disallow: /MySubdomainTwo/


because you're blocking the access to your subdomains, the spiders will be able to access only these files:
> index.php
> page1.html
> page2.html
> robots.txt
located on the root folder (See above)

Note:
You have to specify the "noindex,nofollow" attributes in each head section of each existent page within your blocked subdomain:
<head>
<meta name="robots" content="noindex,nofollow" />
.....
</head>

This way, you don't need an individual robots.txt file for each subdomain.

I hope this helps you.
Costin Trifan is offline  
Add Post to del.icio.us
Reply With Quote
Go Back   Webmaster Forum > Marketing Forums > SEO Forum

Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Robots realestate V7N Directory Issues 4 08-15-2004 10:57 PM
Robots.txt m2k SEO Forum 4 08-13-2004 04:31 PM
robots.txt John Scott Web Design Lobby 6 10-15-2003 08:14 AM


Sponsor Links
Get exposure! Get exposure! Find Scripts Web Hosting Directory Get exposure! SEO Blog


All times are GMT -7. The time now is 04:47 AM.
© Copyright 2008 V7 Inc