Webmaster Forum


Go Back   Webmaster Forum > Marketing Forums > SEO Forum
Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

SEO Forum Search engine optimization discussions.

Ezilon Directory   ClickBooth Network   V7N Directory

Reply
 
LinkBack Thread Tools Display Modes
Old 11-02-2004, 02:32 AM   #1 (permalink)
Inactive
 
Join Date: 08-06-04
Posts: 61
iTrader: 0 / 0%
Latest Blog:
None

losangel27 is liked by many
Duplicate content

if i have 2 sites with several pages that are the same:

1) do both get indexed or is 1 tossed out?
2) do both count as internal links?
3) is there a penalty or redection in PR against 1 or both sites?
4) if pages are identical - only 1 will be displayed in search engine results?

thanks all
losangel27 is offline  
Add Post to del.icio.us
Reply With Quote
Sponsored Links
SEO Hosting by HostGator  Advertise Here  Buy Blog Links
Old 11-02-2004, 08:55 AM   #2 (permalink)
v7n Mentor
 
jg_v7n's Avatar
 
Join Date: 08-26-04
Location: Rio de Janeiro
Posts: 1,289
iTrader: 0 / 0%
Latest Blog:
None

jg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web pro
If the page design/ layout is different but the content is the same then you should be fine, and the two pages will be treated as two totally different pages, no penanalties, both listed in result pages.

I would avoid publishing them if they are identical in all aspects, as you may trip up on duplicate content filters (I do not know what the exact penalties for these are).
jg_v7n is offline  
Add Post to del.icio.us
Reply With Quote
Old 11-02-2004, 01:01 PM   #3 (permalink)
Inactive
 
Papadoc's Avatar
 
Join Date: 10-20-04
Posts: 175
iTrader: 0 / 0%
Latest Blog:
None

Papadoc is liked by somebodyPapadoc is liked by somebodyPapadoc is liked by somebodyPapadoc is liked by somebody
I've seen pages get tossed with the same text and different layouts. Like anything else, there is some algorithm in place that decides how similar a page must be to trip the filter. What's even worse is that I've seen the real author be the one that gets bumped when his material is copied to another site that just happens to gets indexed first.

The latest test that I've seen indicates that if there are 5 or more 10 word strings that are within 90% similarity of 10 word strings on another page, it trips the filter. It could be less than that but that's as tight as the test got significant results. It is suspected that there is another part of it too that determines the amount of suspect text based upon the relative size of the page. This to avoid "fair use" commentary on other pages.

The whole purpose to this is to avoid those people who would make multiple sites that are exactly the same and score high and take over the SERPS.
Papadoc is offline  
Add Post to del.icio.us
Reply With Quote
Old 11-03-2004, 03:51 AM   #4 (permalink)
v7n Mentor
 
Johan007's Avatar
 
Join Date: 10-15-03
Posts: 1,932
iTrader: 0 / 0%
Latest Blog:
None

Johan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to all
In my opinion the replies cannot be correct and are not backed up by any evidence so consider them bs until they are and then call me rude names.

Quote:
Originally Posted by jg v7n
If the page design/ layout is different but the content is the same then you should be fine
There is no way a SE can know this unless it is manually submitted and google just tends to ignore these submissions.

Quote:
Originally Posted by Papadoc
The latest test that I've seen indicates that if there are 5 or more 10 word strings that are within 90% similarity of 10 word strings on another page, it trips the filter. It could be less than that but that's as tight as the test got significant results.
Technically possible... but what tests? This quote can not be correct because many sites have a text only version of HTML pages for accessibility and they are all indexed fine including one of my own sites.

Last edited by Johan007 : 11-03-2004 at 03:59 AM.
Johan007 is offline  
Add Post to del.icio.us
Reply With Quote
Old 11-03-2004, 07:56 AM   #5 (permalink)
v7n Mentor
 
Buskerdoo's Avatar
 
Join Date: 10-16-03
Location: USA
Posts: 1,559
iTrader: 0 / 0%
Latest Blog:
None

Buskerdoo is a highly respected web proBuskerdoo is a highly respected web proBuskerdoo is a highly respected web proBuskerdoo is a highly respected web proBuskerdoo is a highly respected web proBuskerdoo is a highly respected web proBuskerdoo is a highly respected web proBuskerdoo is a highly respected web proBuskerdoo is a highly respected web proBuskerdoo is a highly respected web proBuskerdoo is a highly respected web pro
I too would like to see some actual proof or a least a few tested cases? Anyone have one?
__________________
Great CD/DVD Sleeves, Mailers, Inserts, Labels, and More - Buskerdoo. We also carry Shipping Labels.
Buskerdoo is offline  
Add Post to del.icio.us
Reply With Quote
Old 11-03-2004, 09:49 AM   #6 (permalink)
Inactive
 
Join Date: 05-05-04
Location: america
Posts: 653
iTrader: 0 / 0%
Latest Blog:
None

realestate is liked by somebodyrealestate is liked by somebodyrealestate is liked by somebodyrealestate is liked by somebodyrealestate is liked by somebody
I think answers are: yes, yes, yes, no..
realestate is offline  
Add Post to del.icio.us
Reply With Quote
Old 11-03-2004, 09:50 AM   #7 (permalink)
v7n Mentor
 
jg_v7n's Avatar
 
Join Date: 08-26-04
Location: Rio de Janeiro
Posts: 1,289
iTrader: 0 / 0%
Latest Blog:
None

jg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web pro
Quote:
Originally Posted by Johan007
There is no way a SE can know this unless it is manually submitted and google just tends to ignore these submissions.
Why can't it know this?

1. It's easy to tell that two pages are identical (Google has a patented system of "fingerprinting") - which the will trip the filter.
2. If there are some differences in a page (design or otherwise) then it's easy to assume that it is syndicated copy.

"These near-duplicate detection techniques are robust" - http://www.seoguide.org/google-patent-6658423.htm

"In the context of a search engine for example, these techniques can be used during a crawling operation to speed-up the crawling and to save bandwidth by not crawling near-duplicate Web pages or sites, as determined from documents uncovered in a previous crawl" - http://www.seoguide.org/google-patent-6658423.htm

"These techniques can instead be used later, in response to a query, in which case a user is not annoyed with near-duplicate search results." - http://www.seoguide.org/google-patent-6658423.htm

Here's a load of other forum threads about the subject: Google duplicate content filter

Quote:
Originally Posted by Johan007
In my opinion the replies cannot be correct and are not backed up by any evidence so consider them bs until they are and then call me rude names.
If you come to a forum and ask for advice, you should always check out information that you are given (if someone gives out free advice then it's not up to them to back everything up with tons of research, it's just not practical). Replies are supposed to point you in the right direction, not totally solve all your problems, but if you are going to assume that the answers you are given are bs then whats the point of asking in the first place?
jg_v7n is offline  
Add Post to del.icio.us
Reply With Quote
Old 11-03-2004, 10:50 AM   #8 (permalink)
Inactive
 
Papadoc's Avatar
 
Join Date: 10-20-04
Posts: 175
iTrader: 0 / 0%
Latest Blog:
None

Papadoc is liked by somebodyPapadoc is liked by somebodyPapadoc is liked by somebodyPapadoc is liked by somebody
Wow Johann - I guess I did not know that I needed to prove it to YOU. I am so sorry. Yes, everything is BS unless proven to you. I have learned my lesson.

We did work this out about a year ago in putting together some content for licensing purposes and found the 90% rule. It could have been 85%, 88%, etc. It's hard to decipher any rule that search engines have. It could have changed slightly by now but I don't think too much.

If you want further proof, go get it yourself and don't expect others to do your work for you. Do some research on syndicated copy and find that very often it never gets PR. Where there is other data on the page, it might sneak through, but having 1000 UPI or AP stories show up at the top of the SERPS for a given query is not in the SE best interest. It must be filtered out.

Since I haven't seen your stuff (er, I mean proof by which you label all else in the world BS), I have no idea what you are comparing in content, which search engines you are referring to, or if it's static or dynamic content, headers, etc. All factors which could be considered a part of a formula. Maybe you should offer us up some stats as proof???

No, not really... demanding such is just down right silly isn't it. Take it or leave it. Say you disagree, but watch where you go with labeling other posts BS. You've proven nothing to me either.

Last edited by Papadoc : 11-03-2004 at 11:18 AM.
Papadoc is offline  
Add Post to del.icio.us
Reply With Quote
Old 11-03-2004, 11:11 AM   #9 (permalink)
Inactive
 
Papadoc's Avatar
 
Join Date: 10-20-04
Posts: 175
iTrader: 0 / 0%
Latest Blog:
None

Papadoc is liked by somebodyPapadoc is liked by somebodyPapadoc is liked by somebodyPapadoc is liked by somebody
Quote:
Originally Posted by Johan007
In my opinion the replies cannot be correct and are not backed up by any evidence so consider them bs until they are and then call me rude names.
Since this is now your standard, we shall now be expecting all of your posts to be backed up with research, whitepapers, stats, examples and links to verify each so that we can all determine what you say is correct. This should not take you any more than a day or so per post.

As far as the rude names part? Not my style. I shall leave that up to you and the ones that you've evidently considered others would give to you for your writing style.
Papadoc is offline  
Add Post to del.icio.us
Reply With Quote
Old 11-03-2004, 01:54 PM   #10 (permalink)
v7n Mentor
 
Johan007's Avatar
 
Join Date: 10-15-03
Posts: 1,932
iTrader: 0 / 0%
Latest Blog:
None

Johan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to all
Only Search results are proof. Here goes

Style Filter:

Erm why would you want to do the same design and the same text? You can have this one and I hope Google does ban those people.

Same text filer:

On:
http://www.google.co.uk/search?num=2...ice+text&meta=

These two pages:
http://www.futuremovies.co.uk/friend....asp?movie=239 (My Site)
http://www.channel4.com/film/reviews....jsp?id=137332

Same as:

On:
http://www.google.co.uk/search?num=2...UK%7CcountryGB

These two pages:
www.channel4.com/film/reviews/film.jsp?id=137332
www.futuremovies.co.uk/review.asp?ID=239 (My Site)

There are many more examples of text only versions of sites.


...thats my proof... its not much but its the only "evidence" on this tread.

Last edited by Johan007 : 11-03-2004 at 02:19 PM.
Johan007 is offline  
Add Post to del.icio.us
Reply With Quote
Old 11-03-2004, 04:39 PM   #11 (permalink)
v7n Mentor
 
jg_v7n's Avatar
 
Join Date: 08-26-04
Location: Rio de Janeiro
Posts: 1,289
iTrader: 0 / 0%
Latest Blog:
None

jg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web projg_v7n is a highly respected web pro
Ok, I'm not sure about Papadoc's argument - but as far as I can see thats syndicated content - same content/ different site (design). So clearly thats okay. Agreed?

If these reviews were on the same site (had the same design) - then you would encur a penalty...

There are two sources of proof for this; one is that you cannot find two reviews of "Bride and Prejudice" on the same site. And secondly Google's patented "fingerprinting" technology specified here: http://www.seoguide.org/google-patent-6658423.htm, which clearly explains that duplicate/ near-dupicate content is peanalised

Following this evedence:

1) do both get indexed or is 1 tossed out? - 1 is tossed out, if it is on the same site, but okay if syndicated to another site.

2) do both count as internal links? - Only if on different sites.

3) is there a penalty or redection in PR against 1 or both sites? - Yes, if they are on the same site...

4) if pages are identical - only 1 will be displayed in search engine results? - Correct.

Last edited by jg_v7n : 11-03-2004 at 04:48 PM.
jg_v7n is offline  
Add Post to del.icio.us
Reply With Quote
Old 11-03-2004, 08:04 PM   #12 (permalink)
Inactive
 
Papadoc's Avatar
 
Join Date: 10-20-04
Posts: 175
iTrader: 0 / 0%
Latest Blog:
None

Papadoc is liked by somebodyPapadoc is liked by somebodyPapadoc is liked by somebodyPapadoc is liked by somebody
LOL... You are right... not much! This isn't proof. It's barely circumstantial evidence.

Much of the same text on these two pages. But factor in different H1 tags, all the text related to navigation and ancillary things, different page titles and maybe you are at 90%, 85% whatever... maybe you are not. It’s irrelevant because you got to those results by targeting the title, not the text or a natural search string. I wonder how many people would search for "Bride and Prejudice text". I know I always throw some irrelevant word in at the end of my searches. You remind me of a "search expert" that I met who proved it by showing number one results for the search term "schlingamabob".

Of course you can get both to show up with an artificially contrived search. You can find all instances of an AP or UPI story by typing in a specific string too, but not all stories will show up as being relevant to a real keyword search. Now when I merely search on the name of the movie, your main page shows up on page 4, whereas the text one doesn’t show in the top 300. So you don’t think it got a SERPS penalty? Also, it happens to be on the same domain so chances are it got a different penalty than if it was on a different one. That is also a factor where you can retain some PR but not be listed on the same natural search. The same search does not equal the same results for your site and you had to include the word "text" in order for it to come up. Again, you had to target the title.

Nevertheless, we conducted our experiments over 4 months, toying with percentages of similarity on hundreds of pages on the same domain, on different domains but the same C-Block and on different C-Blocks, making adjustments and such to come up with our results. If we were going to license content, we had to know what the ramifications to our data would be as well as what we could promise. We found some incredible information about where the lines break, levels of similarity, how the order of words on the page affects results, effect of titles and tags, etc.

There is a very strong correlation with different levels of similar text. Some are not indexed, some are indexed but penalized, some get no PR, and a few manage to slip through. At some point, there is no penalty and there are various triggers that make that penalty kick in, including similarity of word strings, and different levels of penalty. When you have (5) 10-word strings that are 90% the same, on average, (I've forgotten the math but) there is about a 1 in a billion chance that it isn't duplicate content. I'm not going to get into where those lines are as those results are the product of many hours of work. The above however, is a fact. Use it if you want, ignore it if you want. No difference to me either way.

Anyone who has every syndicated materials knows that not all instances of the copy as reflected in the SERPS are not equal, most won't show up, and it's a rarity to have more than one copy of the same material end up with any PR. If that were not the case, all you'd have to do is come up with one number one SERP and then duplicate it across dozens of sites in order to own the SERPS.

Last edited by Papadoc : 11-03-2004 at 08:09 PM. Reason: Spelling
Papadoc is offline  
Add Post to del.icio.us
Reply With Quote
Old 11-04-2004, 03:19 AM   #13 (permalink)
v7n Mentor
 
Johan007's Avatar
 
Join Date: 10-15-03
Posts: 1,932
iTrader: 0 / 0%
Latest Blog:
None

Johan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to allJohan007 is a name known to all
Fair points I guess I was barking up the wrong tree. So on my site I am not doing anything wrong am I? There is this new film career site www.actingfaces.com uses Future Movies reviews for there users so a review can be copied about 3 times (1 text, original and 1 external site). All these duplications are legitimate and not for SEO and dont seem to have effected my ranking.

Oh stats show that people do use the search string with “text” however the film industry sites in the UK all have text only versions so its probably learnt.
__________________
work: Read Movie Review at www.FutureMovies.co.uk
personal: my blog
Johan007 is offline  
Add Post to del.icio.us
Reply With Quote
Old 11-05-2004, 08:39 AM   #14 (permalink)
Inactive
 
Papadoc's Avatar
 
Join Date: 10-20-04
Posts: 175
iTrader: 0 / 0%
Latest Blog:
None

Papadoc is liked by somebodyPapadoc is liked by somebodyPapadoc is liked by somebodyPapadoc is liked by somebody
If people in your industry do use the word "text" as part of a search string, then it is simply become an understanding in the industry, and you are matching your title to that standard. It's very smart marketing and you are for all practical purposes not SE marketing, but using the SE as a conduit through an offline understanding between people in the same field. Kind of like all the members of a family knowing where the hidden spare key for the front door is kept.

IMO, you are not doing anything wrong at all. There is no site penalty for having more than one version. One just doesn't show up in the SERPS unless you target the title.

I would be very careful however, about other sites that are using the exact same info. If it's on a different domain and a differerent C-block, you should be fine with regard to penalties on your site. It is just syndicated info. But if the same text is on multiple domains and hosted on the same C-block, I wouldn't be surprised if all sites receive a severe penalty. That's just the data that I have. I'd suspect that the reasoning is that C-blocks are small enough, and chances are that multiple unassociated people all running the same text are not going to be on that same block unless they are conspiring.

Again, this is only based on strong evidence, it is not proof, but from what I have observed with SERPS completely disappearing, I wouldn't take the chance.
Papadoc is offline  
Add Post to del.icio.us
Reply With Quote
Go Back   Webmaster Forum > Marketing Forums > SEO Forum

Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Duplicate content Futon-Matt Google Forum 4 04-09-2008 11:21 PM
Duplicate Content - Help!!! alex06 SEO Forum 4 11-15-2007 09:19 AM
Duplicate Content GeXus Google Forum 2 03-25-2007 06:12 PM
Duplicate Content mreigle SEO Forum 2 06-01-2006 12:48 PM
Duplicate Content awall19s brother Google Forum 6 01-17-2005 02:39 AM


Sponsor Links
Get exposure! Get exposure! Find Scripts Web Hosting Directory Get exposure! SEO Blog


All times are GMT -7. The time now is 04:26 AM.
© Copyright 2008 V7 Inc