 |
|

08-25-2004, 06:05 AM
|
 |
Individualist
|
|
Join Date: 09-27-03
Location: Japan, mostly
Posts: 27,716
|
|
|
Engineering a SE Algorithm
We will be developing a search engine - trying at least - for SevenSeek. I'm looking to hear what you guys believe will make for a great algorithm. Weight distribution, you know. What factors to put a lot of weight on (anchor text, etc) and what factors to not consider/ignore (meta keywords tag), etc.
__________________
My Facebook - Add Me
“It is no measure of health to be well adjusted to a profoundly sick society.”
JSES
|

08-25-2004, 07:03 AM
|
 |
Senior Member
Latest Blog: None
|
|
Join Date: 10-26-03
Posts: 1,911
|
|
Link pop over on-page definitely an important concern, as links are ultimately harder to manipulate.
If I were developing a search engine, I'd go also reference for meta keywords and check each one is listed on the page. Those same keywords must also be present in the meta-description in the form of a sentence, not simply keywords stuffed.
The fact that meta-tags are so little regarded these days means that they are possibly ripe for referencing in a relevancy algo.
If you could get it to read CSS and/or JavaScript, not only would it offer more useful results, but the technology would almost certainly be a good investment for other search engines.  (Presuming they haven't already got the tech, but simply cannot allocate resources to use fully.)
Title tag as having a set character limit - weight keywords as a percentage factor of their presence in the title, ie:
1 <title>Britecorp: Internet Marketing</title>
2 <title>Britecorp: Internet marketing, SEO, link building, PPC, etc</title>
1 as being more relevant for "internet" "marketing" & "internet marketing", because more focussed on keywords - title keyword density, I guess.
Also, h tags score in the same manner. Only first h1 tag of a page as worth anything, and a character limit to it as above.
Maybe a small score for a h2 tag, and maybe a small score for on keyword density of <p> tags - with anything higher than around 5%+ flagging filters for overstuffing (ie, negative scoring).
Devalue links after first 20 from any single IP block.
Devalue links that all contain the exact same link text and/or keywords.
Do topic clusters and related topic relationships, if possible.
Ah - link has most value if: anchor text includes keywords that relate to subject of page linked from as well as the page linked to. So a link from a webmaster site to another webmaster site, has more value than a link from a ****** site to a webmaster site.
Spot FFA lists and *ignore* all links - blacklist entire domain for link value (but not ranking value)
*Randomise* the ranking results somewhat. Older sites naturally have better linkage, but not necessarily better content. Help mix the list by making ranks *approximate* rather than definite.
Devalue *pages* containing any form of affiliate link or modified link (ie, /link.php?ID=2323232).
Make sure the results are entirely non-biased - if sevenseek as a search engine calls up only v7 pages for a search of "web hosting" you risk credibility, and your investment in sevenseek.
Have some degree of human editing - but *only* for clearly marked offences - hidden text, definite cloaking, doorways, etc.
2c for now.
|

08-25-2004, 07:12 AM
|
 |
Individualist
|
|
Join Date: 09-27-03
Location: Japan, mostly
Posts: 27,716
|
|
Quote:
|
Also, h tags score in the same manner. Only first h1 tag of a page as worth anything, and a character limit to it as above.
|
You really think H* tags are worth considering? A lot of sites - maybe the majority - don't use them.  :
__________________
My Facebook - Add Me
“It is no measure of health to be well adjusted to a profoundly sick society.”
JSES
|

08-25-2004, 07:26 AM
|
 |
Senior Member
Latest Blog: None
|
|
Join Date: 10-26-03
Posts: 1,911
|
|
|
It's an accessibility/usability option - and I'm under the impression that it's pretty standard web design to use them.
Not something to score highly in itself, but if page elements are going to figure then some limited value to h tags may make sense.
Additional comment - limited character allowance for alt tags in graphic links, to prevent crediting keyword stuffing.
Also - in code elements as having no effect - if it's not in a tag then it's not worth anything - so <!-- comments --> and css="keyword" values as worth nothing. A basic matter that, really, but possibly easy to overlook.
|

08-25-2004, 07:43 AM
|
 |
Senior Member
Latest Blog: None
|
|
Join Date: 10-26-03
Posts: 1,911
|
|
And ensure any sponsored listings are kept clearly marked from organic listings, to help with consumer usability.
|

08-25-2004, 08:02 AM
|
 |
Senior Member
Latest Blog: None
|
|
Join Date: 10-12-03
Location: Minnesota, USA
Posts: 939
|
|
Very exciting
I suppose now we have to wait forever like we did with Bluefind, eh?
Well, I guess what's important to score, isn't as much important as what not to score...
For example, maybe dock a few spaces for clear spam of keywords.
What to include as far as ranking high:
-I like Brian's idea about the titles, focus on their main first part, and not let them use the titles to cheat their way to the top.
-IBL's: There is such a focus over them already, I would almost think you'd have to include them here.
-Content is very important, content should be held high, but over-using one particular word should get them knocked down.
-A PR-esk feature where sites can gain PR in more ways than JUST getting links... Not really sure how, but we could probably think of something
|

08-25-2004, 10:00 AM
|
 |
v7n Mentor
Latest Blog: None
|
|
Join Date: 02-18-04
Location: We Are Penn State!
Posts: 3,118
|
|
|
as far as a branding / relevancy issue if I were to make a search engine I would intertwine it with the ideas behind prominent web designers and usability experts to reward well structured content and have those people do a big hunk of my marketing for me.
|

08-25-2004, 10:02 AM
|
 |
v7n Mentor
Latest Blog: None
|
|
Join Date: 02-18-04
Location: We Are Penn State!
Posts: 3,118
|
|
Quote:
|
Originally Posted by JohnScott
Quote:
|
Also, h tags score in the same manner. Only first h1 tag of a page as worth anything, and a character limit to it as above.
|
You really think H* tags are worth considering? A lot of sites - maybe the majority - don't use them.  :
|
most sites may not use the stuff, but most of the sites that structure their documents to be semantically well coded probably have a good user experience and good content quality.
there is not going to be static weightings in any engine worth a lick of salt, the headings are something useful to consider though.
|

08-25-2004, 10:43 AM
|
 |
Contributing Member
Latest Blog: None
|
|
Join Date: 10-13-03
Location: USA
Posts: 662
|
|
|
IMO, giving H* tags value only encourages decreasing content value. Think of the spam/auto-generated sites we have now and the techiniques they use. It could only lead to irrelevant headings to pages that have some "value" whatsoever.
|

08-25-2004, 11:04 AM
|
|
v7n Mentor
|
|
Join Date: 01-26-04
Location: Austin, Texas
Posts: 447
|
|
|
I really like the concept of contextual analysis of links, a la Hilltop. It makes sense that if a site about photography has numerous links from other sites on photography, it should score higher for those sorts of terms.
I can't imagine such a thing would be easy to implement, however.
Brian
|

08-25-2004, 11:07 AM
|
|
Member
Latest Blog: None
|
|
Join Date: 05-12-04
Location: Philly
Posts: 107
|
|
|
I'd like to see something where links count more if they were contexual.
I like the idea that if you can measure usability somehow and add that to the algo, that would be interesting.
|

08-25-2004, 11:16 AM
|
 |
Contributing Member
Latest Blog: None
|
|
Join Date: 05-11-04
Location: www.rescuemyreputation.com
Posts: 279
|
|
|
>>>Link pop over on-page definitely an important concern, as links are ultimately harder to manipulate<<<
I have to disagree with that links are easy to manipulate especially if you own tons of domains. Also make it easier for the competition to "google bomb" for a lack of a better term.
Say you own a hosting company you could just put your link at the bottom of all your clients pages, for example if i owned a 'concrete' co and owned a hosting co then i could put that at the bottom of all my hosted pages no?!
|

08-25-2004, 11:21 AM
|
 |
Member
Latest Blog: None
|
|
Join Date: 07-29-04
Posts: 38
|
|
Quote:
You really think H* tags are worth considering? A lot of sites - maybe the majority - don't use them. :
|
I am agreeing with John, don't put any weight on H* tags.
They are too ugly, I have put H1 tag on my site at the beggining, just because I though it worth something SEO wise. I took it out because they are so BIG. (use the first sentence of a page instead)
|

08-25-2004, 11:28 AM
|
|
v7n Mentor
|
|
Join Date: 01-26-04
Location: Austin, Texas
Posts: 447
|
|
|
Sorry for going off topic, but you might consider using H1's, and make them look nice in CSS.
Brian
|

08-25-2004, 02:13 PM
|
 |
Senior Member
Latest Blog: None
|
|
Join Date: 10-26-03
Posts: 1,911
|
|
Quote:
|
Originally Posted by OptWizard
>>>Link pop over on-page definitely an important concern, as links are ultimately harder to manipulate<<<
I have to disagree with that links are easy to manipulate especially if you own tons of domains. Also make it easier for the competition to "google bomb" for a lack of a better term.
Say you own a hosting company you could just put your link at the bottom of all your clients pages, for example if i owned a 'concrete' co and owned a hosting co then i could put that at the bottom of all my hosted pages no?!
|
On that issue - how many webmasters own tons of working domains on different C classes? How many webmasters have a large number of clients on different C classes?
That's the point about links over content for ranking purposes - any schoolboy can overstuff their page with keywords, whilst far far fewer people actually have access to a network of domains across a high number of C class IPs.
Pedantic point over, though.
|

08-25-2004, 02:18 PM
|
|
v7n Mentor
|
|
Join Date: 01-26-04
Location: Austin, Texas
Posts: 447
|
|
|
Add a contextual piece to the link valuation, and it becomes much harder to game...
Brian
|

08-25-2004, 02:26 PM
|
 |
Senior Member
Latest Blog: None
|
|
Join Date: 10-26-03
Posts: 1,911
|
|
|
To return to the original question, though - why engineer a complete new algo anyway?
More specifically: how are you going to make relevancy pay?
Where's your USP, and why can it compete with Google? What can a new algo running on sevenseek offer clients that will actually create real returns? What business model can work for sevenseek when so many other search engines are out there and effectively failing?
If we're talking about spidering sites and then ranking them in a relvant manner, then I personally think the point's been missed. Google may have started with $1 million, but they soon required another $25-30 million of venture capital to expand with.
Sevenseek may be able to pull in good funding at first - but what about expanding - how do you fund the resources necessary for the massive amount of data storage and processing involved?
If the original intention is to create a completely new search engine then my personal opinion would be that there's nothing than can be done that hasn't been done before, or already purchased. The great minds of Google and Microsoft I would wager are more intelligent, experience, and knowledgable on these matters than anyone in this forum. So trying to compete with them with a completely new a re-engineered algo for a completely new bot would be, IMO, a big mistake.
However, I figure there could be a very real market in Meta-Search. Google has been a great ride, but it's getting a bit tired. Google is fighting so hard against so-called "spamming" that it's increasingly becoming less relevant, IMO.
So better if you could have a search engine that does a meta-search of Google, Yahoo, MSN, and Teoma - if you could set up something to process those results in a specific way then you would be able to choose the best from the best - Google linkpop analysis, Yahoo's on-page elements, and Teoma's clustering technology - roll in into one and you could have a real power of an engine.
To make it a success would require marketing and timing. Google is surely running tired, but other meta-search engines simply haven't pushed themselves in marketing terms on the webmatering community - you would need an engine that really pushed on relvancy - that effectively said "Sure, Google, Yahoo, and MSN search all have their good points - but we can make their results more relevant". Really, that's how Google won - by being simple and relevant.
Search Engines currently offer interesting choices, but if you could put money into creaming the best results from each, and processing it all that way, then not only would it be cheaper (no data storage or bots), but you could dress it all up in better relevancy.
So forget what I said above - let the big four with the big cash do the horsework for you - while you do the riding and get the best from each, as the profitable middleman.
2c.
|

08-25-2004, 02:30 PM
|
|
v7n Mentor
|
|
Join Date: 01-26-04
Location: Austin, Texas
Posts: 447
|
|
To attempt to sum up Brian's post: running a search engine is about marketing, not technology.
If that summary is what you meant, Brian, then I completely agree.
(The Other) Brian
|

08-25-2004, 02:36 PM
|
 |
Senior Member
Latest Blog: None
|
|
Join Date: 10-26-03
Posts: 1,911
|
|
Perhaps a better summary would be - don't try to stand against giants - just climb on their shoulders.
|

08-25-2004, 02:39 PM
|
|
v7n Mentor
|
|
Join Date: 01-26-04
Location: Austin, Texas
Posts: 447
|
|
|
Ahhh, ok, then. I agree with both summaries.
Brian
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
|
|
|
| Thread Tools |
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -7. The time now is 12:50 AM.
Powered by vBulletin Copyright © 2000-2013 Jelsoft Enterprises Limited.
Copyright © 2003 - 2013 Escalate Media LP
|
|