 |
|
06-06-2005, 09:12 AM
|
#2 (permalink)
|
|
Senior Member
Join Date: 11-01-04
Location: USA! USA! USA!
Posts: 1,260
Latest Blog: None
|
That's new.
I've never seen that before- Thanks!
|
|
|
06-06-2005, 10:07 AM
|
#3 (permalink)
|
|
Senior Member
Join Date: 06-09-04
Location: Toronto Area
Posts: 175
Latest Blog: None
|
It can only help.
I know for a fact that the Googlebot would get spider trapped on a number of sites that used relative linking.
This way Googlebot does not have to be smart enough to figure out the linking structure of every single site on the internet and they also do not have to waste valuable time and resources in spider traps or indexing what the site owner does not want indexed.
This way they leverage the willingness of site owners who want to get their sites indexed in a certain way, preparing the data for the googlebot to read.
It is more efficient.
|
|
|
06-07-2005, 10:42 AM
|
#4 (permalink)
|
|
Member
Join Date: 11-12-04
Posts: 44
Latest Blog: None
|
has anyone actually used this tool? if so, notice any improvements or differences in rankings, or of new pages being found more quickly by G?
|
|
|
06-07-2005, 04:51 PM
|
#5 (permalink)
|
|
Contributing Member
Join Date: 05-02-04
Posts: 103
|
I understand that Google Sitemaps is an attempt by google to include pages quicker and allow you to say when your site is updated and what parts.
(The sitemap files have a limit of 10 MB, and you can have multipule sitemap files)
I would say that it may not be worth too much in the first few months as it is a beta, but I would recommend people use it with in a year.
I can see it becoming very important in the future.
If you are a web CMS developer (working on the likes of PHPNuke), I would say it is time to start supporting it BEFORE it is important
Site owners may not notice ranking / indexing changes for some time as I suspect that they are getting more "Sitemap" files to play on / test before combining it into the main 'google' (well that is my guess anyway).
|
|
|
06-07-2005, 11:02 PM
|
#6 (permalink)
|
|
Senior Member
Join Date: 06-28-04
Location: Ottawa, Canada
Posts: 385
|
Quote:
|
Originally Posted by thing2b
I understand that Google Sitemaps is an attempt by google to include pages quicker and allow you to say when your site is updated and what parts.
(The sitemap files have a limit of 10 MB, and you can have multipule sitemap files)
I would say that it may not be worth too much in the first few months as it is a beta, but I would recommend people use it with in a year.
I can see it becoming very important in the future.
|
Possibly for very large sites where conventional sitemaps (limited to the Google suggested maximum of 100 links) may be unwieldy.
I don't see it as being terribly useful for the average website, unless I'm missing something. A small to medium sized website or an average forum which Googlebot is already spidering daily isn't likely to see much difference, AFAICT.
It's also possible that it may be a kind of super-submit-site button, at least initially, for new sites... that remains to be seen though.
|
|
|
06-07-2005, 11:09 PM
|
#7 (permalink)
|
|
Senior Member
Join Date: 06-28-04
Location: Ottawa, Canada
Posts: 385
|
Quote:
|
Originally Posted by arius
It can only help.
|
I'm not sure about that. It probably can't hurt but it may not do anything at all for a site that is already being spidered regularly. I have one that is already being Googlebotted 1000-2000 times per week -- how much more spidering do I want or need?
Quote:
|
I know for a fact that the Googlebot would get spider trapped on a number of sites that used relative linking.
|
The site in question uses relative linking... I've never seen a problem as a result of that.
Quote:
|
This way Googlebot does not have to be smart enough to figure out the linking structure of every single site on the internet and they also do not have to waste valuable time and resources in spider traps or indexing what the site owner does not want indexed.
|
???
Googlebot, like other spiders, follows links: That's its job. Provide it with the links and it will follow them. If you don't want something spidered, tag it in a robots.txt file or the "noindex" meta tag and Googlebot will honor that. And as for getting caught in "spider traps", I've never seen a need to have one in the first place -- if you do and you find it's confusing spiders, why not remove or disable the "spider trap"? Problem solved.
Quote:
This way they leverage the willingness of site owners who want to get their sites indexed in a certain way, preparing the data for the googlebot to read.
It is more efficient.
|
Again, how? There are already mechanisms to do this. Huge sites may have hit the limits of more standard or conventional means but how will it benefit the average site?
|
|
|
06-07-2005, 11:28 PM
|
#8 (permalink)
|
|
Contributing Member
Join Date: 05-02-04
Posts: 103
|
I think the aim of the "sitemaps" are to be an aid to googlebot.
At the moment it seems to me that googlebot tries to guess how often a site is updated and in some cases just crawls it daily to get all updates.
I can see that with "sitemaps" google could have 2 types of crawls in the future
Full Crawl (Monthly?) - The typical googlebot crawl over the whole site looking at the links and doing the page rank thing
Quick Crawl (Daily to Monthly?) - Google gets the new updated pages but does not run all algorithms, thus saving processor time/bandwidth/being up to date.
I guess to answer the original question of "Is it worth it, do you think it will improve anything?":
It will be worth it when it it is full tested / implemented, but do not expect to see any changes in crawling in the first 6 months.
When it is full implemented it should help google pounce on all new site updates while not wasting too much of your bandwidth.
|
|
|
06-08-2005, 12:42 AM
|
#9 (permalink)
|
|
Senior Member
Join Date: 06-28-04
Location: Ottawa, Canada
Posts: 385
|
Quote:
|
At the moment it seems to me that googlebot tries to guess how often a site is updated and in some cases just crawls it daily to get all updates.
|
Why? Why wouldn't it access the "last-modified" header? No guesswork at all in that.
|
|
|
06-08-2005, 10:13 AM
|
#10 (permalink)
|
|
Senior Member
Join Date: 06-09-04
Location: Toronto Area
Posts: 175
Latest Blog: None
|
Please help us test a sitemap.xml.gz file Generator tool currently under development.
It has most of the functionality you would need to create the file for your site:
It has the following features. - index your site
- obey keyword filters
- generate a sitemap.xml file
- generate a sitemap.xml.gz file
- ftp a sitemap.xml.gz file to your server
- ping the GoogleBot to come read your sitemap.xml.gz file.
Visit the link below for details:
Beta test the sitemap.xml.gz Generator for use with Google Sitemaps
We're using a forum topic to capture the feed back.
Please register to participate.
|
|
|
06-08-2005, 07:57 PM
|
#11 (permalink)
|
|
Contributing Member
Join Date: 05-02-04
Posts: 103
|
Quote:
|
Originally Posted by minstrel
Why? Why wouldn't it access the "last-modified" header? No guesswork at all in that.
|
Pages that involve dynamic (on the fly) processing (PHP / ASP / JSP etc) do not add the "last-modified" header on by default unless the developer has told it to do so specifically.
To get all the last-modified dates googlebot needs to crawl your whole site. I am guessing that there are many cases where googlebot crawls a large site only to find nothing has changed and that it learnt nothing new from the last crawl of the site.
Last edited by thing2b; 06-08-2005 at 08:10 PM..
|
|
|
06-08-2005, 08:22 PM
|
#12 (permalink)
|
|
Senior Member
Join Date: 06-09-04
Location: Toronto Area
Posts: 175
Latest Blog: None
|
Quote:
|
Originally Posted by thing2b
Pages that involve dynamic (on the fly) processing (PHP / ASP / JSP etc) do not add the "last-modified" header on by default unless the developer has told it to do so specifically.
To get all the last-modified dates googlebot needs to crawl your whole site. I am guessing that there are many cases where googlebot crawls a large site only to find nothing has changed and that it learnt nothing new from the last crawl of the site.
|
Who is this minstrel guy you are quoting?
You are right about dynamic pages (.php, .asp, .jsp) not having a Last Modified date.
|
|
|
06-08-2005, 08:30 PM
|
#13 (permalink)
|
|
Contributing Member
Join Date: 05-02-04
Posts: 103
|
Quote:
|
Originally Posted by arius
Who is this minstrel guy you are quoting?
|
Minstrel's Post was 2 posts about mine on this thead
|
|
|
06-08-2005, 09:05 PM
|
#14 (permalink)
|
|
Senior Member
Join Date: 06-09-04
Location: Toronto Area
Posts: 175
Latest Blog: None
|
Silly Me. 
|
|
|
06-09-2005, 05:49 AM
|
#16 (permalink)
|
|
Senior Member
Join Date: 10-15-03
Posts: 1,785
Latest Blog: None
|
I don’t see why we have to create RSS based on a Google layout. There are already RSS standards out there?
|
|
|
06-09-2005, 06:59 AM
|
#17 (permalink)
|
|
v7n Mentor
Join Date: 12-22-03
Location: UK
Posts: 916
Latest Blog: None
|
Quote:
|
Originally Posted by thing2b
Pages that involve dynamic (on the fly) processing (PHP / ASP / JSP etc) do not add the "last-modified" header on by default unless the developer has told it to do so specifically.
To get all the last-modified dates googlebot needs to crawl your whole site. I am guessing that there are many cases where googlebot crawls a large site only to find nothing has changed and that it learnt nothing new from the last crawl of the site.
|
That is interesting and makes a lot of sense to me. It is maybe why two of my sites get crawled so much (among other factors).
|
|
|
06-09-2005, 09:05 AM
|
#18 (permalink)
|
|
Senior Member
Join Date: 10-15-03
Posts: 1,785
Latest Blog: None
|
Does the "last-modified" header work on re-write URL's (Natural Query String URL's should be fine)
|
|
|
06-09-2005, 02:15 PM
|
#19 (permalink)
|
|
Contributing Member
Join Date: 05-02-04
Posts: 103
|
Quote:
|
Originally Posted by Johan007
Does the "last-modified" header work on re-write URL's (Natural Query String URL's should be fine)
|
Although we are now getting a bit off topic, I would think that when urls are re-written, the headers that are sent back is determined by the resulting URL (after the re-write).
|
|
|
06-09-2005, 02:23 PM
|
#20 (permalink)
|
|
Contributing Member
Join Date: 05-02-04
Posts: 103
|
Quote:
|
Originally Posted by Johan007
I don’t see why we have to create RSS based on a Google layout. There are already RSS standards out there?
|
I know what you mean. The only real difference between RSS and Google's site map protocol that I see is that RSS can link to many different items all on one page where-as the sitemap protocol is ment to be per url.
There might be more ownership reasons behind the choise not to use RSS as well. Google might not like the licence that RSS is under (guess), or may have a larger plan or upgrades comming that could not be acheived with RSS. This way gives Google more flexability to change its standard latter.
|
|
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
|
|
|
| Thread Tools |
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -7. The time now is 01:43 AM.
© Copyright 2008 V7 Inc Powered by vBulletin Copyright © 2000-2009 Jelsoft Enterprises Limited.
|
|