2011年11月23日水曜日

2011年10月20日木曜日

WIREDoo

MC Hammer made a search engine that does "deep search".

WIREDoo is the name.

Interesting.

via CNET

2011年10月18日火曜日

Video to Watch: Anthropologist for Intel describes Big Data as a person

Found this interesting video from BigData48.com.

Video to watch: Google PageRank In Five Lines Of Ruby

Video to watch: Google PageRank In Five Lines Of Ruby.

Okay, I'm definitely trying this out.

2011年10月17日月曜日

Strata New York 2011 Video

to watch later.

Quote of the day.

This is a quote from Arnab Gupta
, CEO of Opera Solutions I found today.

"...behavior - human or otherwise can be mathematically expressed. "

2011年9月13日火曜日

Ameba and Solr/Lucene

This slideshow about Ameba using Solr/Lucene is helpful. Although it is in Japanese, it is not too hard to translate. http://www.slideshare.net/giddyupyasu/3solrsolr

2011年8月30日火曜日

To read later.

http://rikunabi-next.yahoo.co.jp/tech/docs/ct_s03600.jsp?p=001940

2011年8月7日日曜日

Topics to Study.

This article, called "Is Data The Next Oil?" is worth a read.

The key to a successful search engine is the ability to monetize, create innovative ideas a reality, and dominate the market. Making such search engine is difficult, but can be made possible by collaboration, mash-up and teaming up with bright programmers.

I believe getting people who are interested in making a impact on the search engine world is not very hard. Many people will be interested in making a search engine from scratch, and operating it.

The real problem lies in monetizing the search engine.

First, I think it is all right to start off with AdSense and other affiliate advertisement. Ultimately, the goal should be to sell advertisement directly with interested parties. Other than revenues from advertisements, selling licenses of the search engine, and compiling the data coming in from the search engine query and logs and selling them are crucial.

Marketing is an important issue with a newcomer to the market. When there are many alternatives in search engines, to differentiate is essential.

I believe I should focus on machine translation, because honestly, the online machine translation level is still primitive stage.

The query, when the search engine becomes well known and used, will provide the barometer for peoples' interests, and the data can be used for making sites from the data. For example, I am sure many 3rd party will be interested in getting query and click logs with privacy data removed.

This article is probably worth a read too.

2011年7月30日土曜日

"Advertising Engines" lecture by Yury Lifshits



"Advertising Engines" lecture by Yury Lifshits

Definitely worth watching.

2011年7月19日火曜日

University of Technology Sydney - Data Sciences and Knowledge Discovery

University of Technology Sydney - Data Sciences and Knowledgehttp://www.blogger.com/img/blank.gif Discovery

http://datamining.it.uts.edu.au/group/

interesting. Will read later.

2011年7月15日金曜日

Interesting article on how reddit ranking algorithm works.

Interesting article on how reddit ranking algorithm works.

I will be studying from now.

2011年7月13日水曜日

People I should research on.

Following are people I am thinking of researching on.

They are the founders of Katlix, acquired by Google in September 2003.

Glen Jeh http://sites.google.com/site/glenjeh/

Sepandar Kamvar http://kamvar.org/

Taher Haveliwala http://taherh.org/

Interesting Link about social information mining on Sepandar Kamvar's site http://kamvar.org/social_text_mining

Now, after reading the above link about social information mining, I have this idea of mining SNS such as Facebook and Google+. Eventually I am going to put data in a database and correlate with many variables such as location, punctuation, shared link, number of likes and +1, weather.

Mining social information is essential in understanding human relationships and friendship. Of course, online activities do not constitute everything an individual does, but it is still essential in learning the social life of many individuals.

I think it will be interesting to compare and contrast the difference of the user in the online world and the real life.

Data will be enormous, but I am hoping it will be big enough to fill up my 1TB hard drive with just data. If data gets too big, I should be implementing compression algorithms to save space.

Since Google recently started Google+, I think it will be interesting from now on, and see who dominates the market in the following years. I think the key essential to Google+'s success is the ability to add on 3rd party apps.

This will create a whole new market for Google, and I am sure they are going to do it in the near future.

2011年6月10日金曜日

30nm Samsung DDR3 to cut power consumption and boost speed.

30nm Samsung DDR3 to cut power consumption and boost speed.

Read this article by Engadget

Might be a useful way to speed up the RAM speed and cut power consumption.

2011年6月3日金曜日

2011年5月29日日曜日

Motherboard with bolted on SSD.

Motherboard with bolted on SSD is going to be sold. The hard disk that gets connected to the motherboard automatically becomes a hybrid drive!

This can give search engine performance a boost, I believe.

via Engadget

2011年5月26日木曜日

サイバーエージェント @ Akiba

http://journal.mycom.co.jp/news/2011/02/17/019/index.html

2011年5月17日火曜日

2011年5月10日火曜日

Interesting Google App Engine application - Evite

http://googlecode.blogspot.com/2011/05/evites-use-of-google-app-engine.html

Interesting Google App Engine application - Evite

The most interesting part is the data synchronization between Oracle RAC and App Engine datastore.

I'm wanting to know how to use both MySQL and App Engine datastore in some sort of way.

2011年4月21日木曜日

QDF means ... ?

QDF is an acronym for "Query Deserves Freshness".

It is one of the algorithm used by Google.

2011年4月12日火曜日

Link of the Day: Open Compute

This is the link of the day. It tells you about facebook's efficient server.http://opencompute.org/

2011年3月31日木曜日

How do I become a data scientist - from Quora

How do I become a data scientist - from Quora

http://www.quora.com/How-do-I-become-a-data-scientist

This question in Quora was answered very well, and it gave me some ideas.

First, I should really learn Hadoop, study statistics, learn more about MapReduce, and take a plunge at R, etc.

Data science is probably going to be very cool thing to do in the next 10 years.

2011年3月30日水曜日

Quote from O'Reilly's article I found interesting.

"Entrepreneurship is another piece of the puzzle. Patil's first flippant answer to "what kind of person are you looking for when you hire a data scientist?" was "someone you would start a company with." That's an important insight: we're entering the era of products that are built on data."

Source: http://radar.oreilly.com/2010/06/what-is-data-science.html

Data scientists are increasingly getting popular, and in large demand. I really want to find a person I can partner with in the near future, and I hope to find a partner internationally.

You know, start-ups' starters usually had a good partner to begin with. (Microsoft, Google, etc.)

Data Scraping - Screen Scraping

A new word I found out today: Screen Scraping.

2011年3月28日月曜日

Blekko Video



I found this Blekko Video interesting.

2011年3月27日日曜日

Can you benefit from content scraped from your site? - Cool video Part 2



Can you benefit from content scraped from your site? - Cool video Part 2

Cool video I found on Youtube



Cool video I found.

"How would you run your own online marketing company?"

2011年3月18日金曜日

参考になるリンク

http://d.hatena.ne.jp/mjmania/touch/20090205/1233766538

2011年3月7日月曜日

A quote from Daniel Dennett

A quote from Daniel Dennett (translated to Japanese) was inspiring for me.

「学者は図書館をもうひとつ作るための手段である。」

Link of Interest.

http://www.infed.org/thinkers/et-lewin.htm

About Kurt Lewin

2011年3月4日金曜日

Thoughts about Search Engine 2011

My thoughts about Search Engine in the year 2011.

I was studying Google's PageRank and Beyond by Amy N. Langville and Carl D. Meyer, and here are the thoughts that popped up in my head.

* To combat unethical SEO, the search engine must increase its IQ of spider and indexer.

* Most search engines generate revenue by selling profile data to interested parties.

* Users search for:

1. Wanting authoritative pages for searching deeply or for research.
2. Wanting hub pages, for broad search.

My idea: to make a ranking method of extracting only interesting pages.
The definition of interesting is rather broad, but in this case, sites that are buried in the deep web, but are high in quality. In one word RARE sites.

2011年2月16日水曜日

Made a list of sites talking about search engine ranking.

I made a list of sites in Japanese talking about search engine ranking. Hope it will be of some use to you.

Best Regards,
Tsubasa K.

http://d.hatena.ne.jp/stingraze/20110216/1297833710

2011年2月4日金曜日

Some keywords I will study.

Some keywords I found, which I must study.

LSI : Latent Semantic Searching

Singular value decomposition


Web information retrieval

2011年1月26日水曜日

Getting 1400MB/s throughput on a SSD.

This SSD has a throughput of 1400MB/s and I want it so much. ;) It uses PCI Express slot instead of SATA.

but the thing is, it's ULTRA expensive. It costs 998,000yen incl. tax!

I guess I will wait for some time until the price drops to 1/10 the price....

Well, if somebody is thinking of scaling up storage having unlimited budget with a 1400MB/s SSD that has a capacity of 300GB, it might be worth the cost.

Or if somebody is kind enough to donate me one of this, I will be eternally grateful.

2011年1月14日金曜日

2011年1月13日木曜日

2011年1月12日水曜日

Some Basics and Application You Should Know When Making a Search Engine.

Some basic keywords I will like to introduce to you:

Outbound Link
Inbound Link
Keyword Density

One simple algorithm I am sharing with you is the keyword density calculation. To put it simply, just calculate how many keywords there are, and get the percentage of the keyword you want to calculate against the whole set of keywords.

k/ak where k is keyword and ak is all keywords set.


Stop Word - You should check this out.
Score - Very important in ranking. I would say this is the most basic, but the hardest part of an search engine.

Definition of "score" according to SearchEngineDictionary.com

"Search engines usually arrange search results from the most relevant to the least relevant (as determined by the search engine's algorithm). In order to rank documents, the search engine assigns a score to each page and those with the highest scores are listed first. Most search engines simply give the maximum score to the most relevant document and score all other relevant documents relative to that document. Others compare all documents to a theoretically perfect document. The score of a web page therefore refers to its relevance as perceived by a specific search engine."


Definition of "scored keyword phrase" according to SearchEngineDictionary.com

"Name given to phrases that searchers use that are tracked by a system the records the number of times the phrase was used in a search, also known as the score."

One interesting technology I found is Lucandra, a Cassandra based Lucene backend.

Making a search engine will require months of testing, and improvement, it's almost like a never ending cycle. BUT, it is well worth it. Monetizing as a goal might be good, but it is important to make the search engine experience superb, and make it the main goal.

Enhancing lots of optional search methods or adding lots of database source (twitter) is a good idea.

Competition is harsh, too as small sized search engine attract only a tiny percentage of the whole search engine traffic. Entering the niche world is a very good idea.
I think starting from a different approach than a traditional search engine is crucial in achieving success in this world.

Hadoop O'Reilly Webcast

Cluster Computing and MapReduce Lecture 4



Another interesting video worth watching.

Hadoop Visualization


Hadoop Visualization - interesting video.

2011年1月6日木曜日

Keywords I'm interested in.

Here are some keywords I am interested in learning.

MPI(MPICH-SCore,YAMPI)

SCASH

SCore

Note to self: interesting server node:https://supcom.hgc.jp/japanese/sys_const/000012.html

Super computer(with GPGPU) for rent http://itpro.nikkeibp.co.jp/article/NEWS/20101102/353730/

2011年1月1日土曜日

インスピレーショナルな何か。

「運命は大胆なる者に味方する。」- デジデリウス エラスムス

もう、2011年ですね。

今年は去年よりもダイナミックに生きたいです。大胆に、元気に、また楽しく。

自作のサーバー機もうまく作れたし、Mohawkはあと少しでリリース出来ます。専用線引かないとな。。。