If you have an interest in how Google addresses duplicate content on the Web, today’s been an interesting day.
Google was granted a patent this morning that describes how Google might identify duplicate or near duplicate web pages and decide which version to show in search results, and which ones to filter out. It’s a process possibly close to what Google has been using for a while.
Identifying the original source of content can be a pretty hard problem to solve on the Web.
What if Google had a smaller search vertical, where they carefully screened and identified all of the web publishers involved, and could convince them to help identify which content is original, and which is copied or duplicated?
To do that, Google introduced a new set of Source Attribution Metatags for Google News articles, which allow the publishers of breaking stories to tag those stories with an “original source” meta tag, and publishers who are syndicating those stories to use a “syndication-source” metatag. Google controls which sources show up in Google News results, and they note in their page about the source attribution metatags that:
Continued: Google on Duplicate Content Filtering and News Attribution Metatags
Bill Slawski, November 16, 2010
Geewhiz, someone has already posted how to spam this.