My thoughts about Search Engine in the year 2011.
I was studying Google's PageRank and Beyond by Amy N. Langville and Carl D. Meyer, and here are the thoughts that popped up in my head.
* To combat unethical SEO, the search engine must increase its IQ of spider and indexer.
* Most search engines generate revenue by selling profile data to interested parties.
* Users search for:
1. Wanting authoritative pages for searching deeply or for research.
2. Wanting hub pages, for broad search.
My idea: to make a ranking method of extracting only interesting pages.
The definition of interesting is rather broad, but in this case, sites that are buried in the deep web, but are high in quality. In one word RARE sites.
2011年3月4日金曜日
2011年1月12日水曜日
Some Basics and Application You Should Know When Making a Search Engine.
Some basic keywords I will like to introduce to you:
Outbound Link
Inbound Link
Keyword Density
One simple algorithm I am sharing with you is the keyword density calculation. To put it simply, just calculate how many keywords there are, and get the percentage of the keyword you want to calculate against the whole set of keywords.
k/ak where k is keyword and ak is all keywords set.
Stop Word - You should check this out.
Score - Very important in ranking. I would say this is the most basic, but the hardest part of an search engine.
Definition of "score" according to SearchEngineDictionary.com
"Search engines usually arrange search results from the most relevant to the least relevant (as determined by the search engine's algorithm). In order to rank documents, the search engine assigns a score to each page and those with the highest scores are listed first. Most search engines simply give the maximum score to the most relevant document and score all other relevant documents relative to that document. Others compare all documents to a theoretically perfect document. The score of a web page therefore refers to its relevance as perceived by a specific search engine."
Definition of "scored keyword phrase" according to SearchEngineDictionary.com
"Name given to phrases that searchers use that are tracked by a system the records the number of times the phrase was used in a search, also known as the score."
One interesting technology I found is Lucandra, a Cassandra based Lucene backend.
Making a search engine will require months of testing, and improvement, it's almost like a never ending cycle. BUT, it is well worth it. Monetizing as a goal might be good, but it is important to make the search engine experience superb, and make it the main goal.
Enhancing lots of optional search methods or adding lots of database source (twitter) is a good idea.
Competition is harsh, too as small sized search engine attract only a tiny percentage of the whole search engine traffic. Entering the niche world is a very good idea.
I think starting from a different approach than a traditional search engine is crucial in achieving success in this world.
Outbound Link
Inbound Link
Keyword Density
One simple algorithm I am sharing with you is the keyword density calculation. To put it simply, just calculate how many keywords there are, and get the percentage of the keyword you want to calculate against the whole set of keywords.
k/ak where k is keyword and ak is all keywords set.
Stop Word - You should check this out.
Score - Very important in ranking. I would say this is the most basic, but the hardest part of an search engine.
Definition of "score" according to SearchEngineDictionary.com
"Search engines usually arrange search results from the most relevant to the least relevant (as determined by the search engine's algorithm). In order to rank documents, the search engine assigns a score to each page and those with the highest scores are listed first. Most search engines simply give the maximum score to the most relevant document and score all other relevant documents relative to that document. Others compare all documents to a theoretically perfect document. The score of a web page therefore refers to its relevance as perceived by a specific search engine."
Definition of "scored keyword phrase" according to SearchEngineDictionary.com
"Name given to phrases that searchers use that are tracked by a system the records the number of times the phrase was used in a search, also known as the score."
One interesting technology I found is Lucandra, a Cassandra based Lucene backend.
Making a search engine will require months of testing, and improvement, it's almost like a never ending cycle. BUT, it is well worth it. Monetizing as a goal might be good, but it is important to make the search engine experience superb, and make it the main goal.
Enhancing lots of optional search methods or adding lots of database source (twitter) is a good idea.
Competition is harsh, too as small sized search engine attract only a tiny percentage of the whole search engine traffic. Entering the niche world is a very good idea.
I think starting from a different approach than a traditional search engine is crucial in achieving success in this world.
2010年10月25日月曜日
ウェブの暴君
「ウェブの暴君 」
このフレーズを聞いたことのある人は恐らく少数派だろう。
僕が初めて聞いたのは今日、それも偶然見つけたのでは無くて、ネット検索革命と言うアレクサンダー・ハラヴェ著の本を読んでからだ。
ウェブは民主的だと言われているが実際はそうだろうか。
どの検索エンジンを作っている方々にも直面する事実があると思う。
Are we a tyrant of the web?
こういうことだ。
本当にウェブを民主的にするなら、一般市民がランキングに携われるようにするのが理想だろう。
だが、ほとんどの場合、営利目的で運営されている今日の検索エンジンはそう簡単にこのシステムを導入する事ができないだろう。
僕自身、今開発中の検索エンジンでランキングの部分で少し悩んでいるが、本当にウェブは民主的なのだろうか。
今からまたランキングの新しいアイディアをひらめくまで考えるとする。
-stingraze
このフレーズを聞いたことのある人は恐らく少数派だろう。
僕が初めて聞いたのは今日、それも偶然見つけたのでは無くて、ネット検索革命と言うアレクサンダー・ハラヴェ著の本を読んでからだ。
ウェブは民主的だと言われているが実際はそうだろうか。
どの検索エンジンを作っている方々にも直面する事実があると思う。
Are we a tyrant of the web?
こういうことだ。
本当にウェブを民主的にするなら、一般市民がランキングに携われるようにするのが理想だろう。
だが、ほとんどの場合、営利目的で運営されている今日の検索エンジンはそう簡単にこのシステムを導入する事ができないだろう。
僕自身、今開発中の検索エンジンでランキングの部分で少し悩んでいるが、本当にウェブは民主的なのだろうか。
今からまたランキングの新しいアイディアをひらめくまで考えるとする。
-stingraze
登録:
投稿 (Atom)