01-28-2008, 12:42 AM
|
#9 (permalink)
|
|
Contributing Member
Join Date: 01-22-08
Posts: 55
Latest Blog: None
|
Quote:
Originally Posted by coolguy27
LSI is a methodology involving statistical probability and correlation that helps deducing the semantic distance between words. It’s obviously a complex methodology but can be easily applied to understand the relation between certain words in a paragraph or in a document. This methodology is being used while indexing a page in the search engine’s database.s.
Latent semantic indexing adds an important step to the document indexing process. In addition to recording which keywords a document contains, the method examines the document collection as a whole, to see which other documents contain some of those same words. LSI considers documents that have many words in common to be semantically close, and ones with few words in common to be semantically distant. This simple method correlates surprisingly well with how a human being, looking at content, might classify a document collection. Although the LSI algorithm doesn't understand anything about what the words mean, the patterns it notices can make it seem astonishingly intelligent.
Quoted from source......
|
Nice digging dude...
|
|
|