2011年11月23日水曜日
2011年10月26日水曜日
2011年10月23日日曜日
2011年10月20日木曜日
2011年10月18日火曜日
Video to Watch: Anthropologist for Intel describes Big Data as a person
Found this interesting video from BigData48.com.
Video to watch: Google PageRank In Five Lines Of Ruby
Video to watch: Google PageRank In Five Lines Of Ruby.
Okay, I'm definitely trying this out.
2011年10月17日月曜日
Quote of the day.
This is a quote from Arnab Gupta
, CEO of Opera Solutions I found today.
"...behavior - human or otherwise can be mathematically expressed. "
2011年10月12日水曜日
2011年9月13日火曜日
Ameba and Solr/Lucene
This slideshow about Ameba using Solr/Lucene is helpful.
Although it is in Japanese, it is not too hard to translate.
http://www.slideshare.net/giddyupyasu/3solrsolr
2011年8月30日火曜日
2011年8月29日月曜日
Development Idea
A link for further development ideas for the future.
http://www-igm.univ-mlv.fr/~lecroq/string/node6.html
http://www-igm.univ-mlv.fr/~lecroq/string/node6.html
2011年8月7日日曜日
Topics to Study.
This article, called "Is Data The Next Oil?" is worth a read.
The key to a successful search engine is the ability to monetize, create innovative ideas a reality, and dominate the market. Making such search engine is difficult, but can be made possible by collaboration, mash-up and teaming up with bright programmers.
I believe getting people who are interested in making a impact on the search engine world is not very hard. Many people will be interested in making a search engine from scratch, and operating it.
The real problem lies in monetizing the search engine.
First, I think it is all right to start off with AdSense and other affiliate advertisement. Ultimately, the goal should be to sell advertisement directly with interested parties. Other than revenues from advertisements, selling licenses of the search engine, and compiling the data coming in from the search engine query and logs and selling them are crucial.
Marketing is an important issue with a newcomer to the market. When there are many alternatives in search engines, to differentiate is essential.
I believe I should focus on machine translation, because honestly, the online machine translation level is still primitive stage.
The query, when the search engine becomes well known and used, will provide the barometer for peoples' interests, and the data can be used for making sites from the data. For example, I am sure many 3rd party will be interested in getting query and click logs with privacy data removed.
This article is probably worth a read too.
The key to a successful search engine is the ability to monetize, create innovative ideas a reality, and dominate the market. Making such search engine is difficult, but can be made possible by collaboration, mash-up and teaming up with bright programmers.
I believe getting people who are interested in making a impact on the search engine world is not very hard. Many people will be interested in making a search engine from scratch, and operating it.
The real problem lies in monetizing the search engine.
First, I think it is all right to start off with AdSense and other affiliate advertisement. Ultimately, the goal should be to sell advertisement directly with interested parties. Other than revenues from advertisements, selling licenses of the search engine, and compiling the data coming in from the search engine query and logs and selling them are crucial.
Marketing is an important issue with a newcomer to the market. When there are many alternatives in search engines, to differentiate is essential.
I believe I should focus on machine translation, because honestly, the online machine translation level is still primitive stage.
The query, when the search engine becomes well known and used, will provide the barometer for peoples' interests, and the data can be used for making sites from the data. For example, I am sure many 3rd party will be interested in getting query and click logs with privacy data removed.
This article is probably worth a read too.
2011年7月30日土曜日
"Advertising Engines" lecture by Yury Lifshits
"Advertising Engines" lecture by Yury Lifshits
Definitely worth watching.
2011年7月28日木曜日
2011年7月19日火曜日
University of Technology Sydney - Data Sciences and Knowledge Discovery
University of Technology Sydney - Data Sciences and Knowledgehttp://www.blogger.com/img/blank.gif Discovery
http://datamining.it.uts.edu.au/group/
interesting. Will read later.
http://datamining.it.uts.edu.au/group/
interesting. Will read later.
2011年7月18日月曜日
2011年7月13日水曜日
People I should research on.
Following are people I am thinking of researching on.
They are the founders of Katlix, acquired by Google in September 2003.
Glen Jeh http://sites.google.com/site/glenjeh/
Sepandar Kamvar http://kamvar.org/
Taher Haveliwala http://taherh.org/
Interesting Link about social information mining on Sepandar Kamvar's site http://kamvar.org/social_text_mining
Now, after reading the above link about social information mining, I have this idea of mining SNS such as Facebook and Google+. Eventually I am going to put data in a database and correlate with many variables such as location, punctuation, shared link, number of likes and +1, weather.
Mining social information is essential in understanding human relationships and friendship. Of course, online activities do not constitute everything an individual does, but it is still essential in learning the social life of many individuals.
I think it will be interesting to compare and contrast the difference of the user in the online world and the real life.
Data will be enormous, but I am hoping it will be big enough to fill up my 1TB hard drive with just data. If data gets too big, I should be implementing compression algorithms to save space.
Since Google recently started Google+, I think it will be interesting from now on, and see who dominates the market in the following years. I think the key essential to Google+'s success is the ability to add on 3rd party apps.
This will create a whole new market for Google, and I am sure they are going to do it in the near future.
They are the founders of Katlix, acquired by Google in September 2003.
Glen Jeh http://sites.google.com/site/glenjeh/
Sepandar Kamvar http://kamvar.org/
Taher Haveliwala http://taherh.org/
Interesting Link about social information mining on Sepandar Kamvar's site http://kamvar.org/social_text_mining
Now, after reading the above link about social information mining, I have this idea of mining SNS such as Facebook and Google+. Eventually I am going to put data in a database and correlate with many variables such as location, punctuation, shared link, number of likes and +1, weather.
Mining social information is essential in understanding human relationships and friendship. Of course, online activities do not constitute everything an individual does, but it is still essential in learning the social life of many individuals.
I think it will be interesting to compare and contrast the difference of the user in the online world and the real life.
Data will be enormous, but I am hoping it will be big enough to fill up my 1TB hard drive with just data. If data gets too big, I should be implementing compression algorithms to save space.
Since Google recently started Google+, I think it will be interesting from now on, and see who dominates the market in the following years. I think the key essential to Google+'s success is the ability to add on 3rd party apps.
This will create a whole new market for Google, and I am sure they are going to do it in the near future.
2011年6月10日金曜日
30nm Samsung DDR3 to cut power consumption and boost speed.
30nm Samsung DDR3 to cut power consumption and boost speed.
Read this article by Engadget
Might be a useful way to speed up the RAM speed and cut power consumption.
Read this article by Engadget
Might be a useful way to speed up the RAM speed and cut power consumption.
2011年6月3日金曜日
Google, Bing, and Yahoo collaborating.
http://www.webpronews.com/schemas-google-bing-yahoo-2011-06
Google, Bing, and Yahoo collaborating. Check the above link!
Google, Bing, and Yahoo collaborating. Check the above link!
2011年5月29日日曜日
Motherboard with bolted on SSD.
Motherboard with bolted on SSD is going to be sold. The hard disk that gets connected to the motherboard automatically becomes a hybrid drive!
This can give search engine performance a boost, I believe.
via Engadget
This can give search engine performance a boost, I believe.
via Engadget
2011年5月26日木曜日
2011年5月17日火曜日
Found a very interesting blog about Fictional AdSense Staff
This blog was really interesting to me, about a fictional AdSense staff. It's in Japanese though.
http://colo-ri.jp/develop/2011/01/google-adsense-for-seo.html
http://colo-ri.jp/develop/2011/01/google-adsense-for-seo.html
2011年5月10日火曜日
Interesting Google App Engine application - Evite
http://googlecode.blogspot.com/2011/05/evites-use-of-google-app-engine.html
Interesting Google App Engine application - Evite
The most interesting part is the data synchronization between Oracle RAC and App Engine datastore.
I'm wanting to know how to use both MySQL and App Engine datastore in some sort of way.
Interesting Google App Engine application - Evite
The most interesting part is the data synchronization between Oracle RAC and App Engine datastore.
I'm wanting to know how to use both MySQL and App Engine datastore in some sort of way.
2011年4月21日木曜日
QDF means ... ?
QDF is an acronym for "Query Deserves Freshness".
It is one of the algorithm used by Google.
It is one of the algorithm used by Google.
2011年4月12日火曜日
Link of the Day: Open Compute
This is the link of the day. It tells you about facebook's efficient server.http://opencompute.org/
2011年4月2日土曜日
Important News.
Make sure to check this out.
http://www.pcworld.com/businesscenter/article/223856/googles_app_engine_gets_new_search_feature.html
http://www.pcworld.com/businesscenter/article/223856/googles_app_engine_gets_new_search_feature.html
2011年3月31日木曜日
How do I become a data scientist - from Quora
How do I become a data scientist - from Quora
http://www.quora.com/How-do-I-become-a-data-scientist
This question in Quora was answered very well, and it gave me some ideas.
First, I should really learn Hadoop, study statistics, learn more about MapReduce, and take a plunge at R, etc.
Data science is probably going to be very cool thing to do in the next 10 years.
http://www.quora.com/How-do-I-become-a-data-scientist
This question in Quora was answered very well, and it gave me some ideas.
First, I should really learn Hadoop, study statistics, learn more about MapReduce, and take a plunge at R, etc.
Data science is probably going to be very cool thing to do in the next 10 years.
2011年3月30日水曜日
Quote from O'Reilly's article I found interesting.
"Entrepreneurship is another piece of the puzzle. Patil's first flippant answer to "what kind of person are you looking for when you hire a data scientist?" was "someone you would start a company with." That's an important insight: we're entering the era of products that are built on data."
Source: http://radar.oreilly.com/2010/06/what-is-data-science.html
Data scientists are increasingly getting popular, and in large demand. I really want to find a person I can partner with in the near future, and I hope to find a partner internationally.
You know, start-ups' starters usually had a good partner to begin with. (Microsoft, Google, etc.)
Source: http://radar.oreilly.com/2010/06/what-is-data-science.html
Data scientists are increasingly getting popular, and in large demand. I really want to find a person I can partner with in the near future, and I hope to find a partner internationally.
You know, start-ups' starters usually had a good partner to begin with. (Microsoft, Google, etc.)
2011年3月28日月曜日
2011年3月27日日曜日
Can you benefit from content scraped from your site? - Cool video Part 2
Can you benefit from content scraped from your site? - Cool video Part 2
Cool video I found on Youtube
Cool video I found.
"How would you run your own online marketing company?"
2011年3月18日金曜日
2011年3月7日月曜日
A quote from Daniel Dennett
A quote from Daniel Dennett (translated to Japanese) was inspiring for me.
「学者は図書館をもうひとつ作るための手段である。」
「学者は図書館をもうひとつ作るための手段である。」
2011年3月4日金曜日
Thoughts about Search Engine 2011
My thoughts about Search Engine in the year 2011.
I was studying Google's PageRank and Beyond by Amy N. Langville and Carl D. Meyer, and here are the thoughts that popped up in my head.
* To combat unethical SEO, the search engine must increase its IQ of spider and indexer.
* Most search engines generate revenue by selling profile data to interested parties.
* Users search for:
1. Wanting authoritative pages for searching deeply or for research.
2. Wanting hub pages, for broad search.
My idea: to make a ranking method of extracting only interesting pages.
The definition of interesting is rather broad, but in this case, sites that are buried in the deep web, but are high in quality. In one word RARE sites.
I was studying Google's PageRank and Beyond by Amy N. Langville and Carl D. Meyer, and here are the thoughts that popped up in my head.
* To combat unethical SEO, the search engine must increase its IQ of spider and indexer.
* Most search engines generate revenue by selling profile data to interested parties.
* Users search for:
1. Wanting authoritative pages for searching deeply or for research.
2. Wanting hub pages, for broad search.
My idea: to make a ranking method of extracting only interesting pages.
The definition of interesting is rather broad, but in this case, sites that are buried in the deep web, but are high in quality. In one word RARE sites.
2011年2月21日月曜日
2011年2月16日水曜日
Made a list of sites talking about search engine ranking.
I made a list of sites in Japanese talking about search engine ranking. Hope it will be of some use to you.
Best Regards,
Tsubasa K.
http://d.hatena.ne.jp/stingraze/20110216/1297833710
Best Regards,
Tsubasa K.
http://d.hatena.ne.jp/stingraze/20110216/1297833710
2011年2月7日月曜日
2011年2月4日金曜日
Some keywords I will study.
Some keywords I found, which I must study.
LSI : Latent Semantic Searching
Singular value decomposition
Web information retrieval
LSI : Latent Semantic Searching
Singular value decomposition
Web information retrieval
2011年1月26日水曜日
Getting 1400MB/s throughput on a SSD.
This SSD has a throughput of 1400MB/s and I want it so much. ;) It uses PCI Express slot instead of SATA.
but the thing is, it's ULTRA expensive. It costs 998,000yen incl. tax!
I guess I will wait for some time until the price drops to 1/10 the price....
Well, if somebody is thinking of scaling up storage having unlimited budget with a 1400MB/s SSD that has a capacity of 300GB, it might be worth the cost.
Or if somebody is kind enough to donate me one of this, I will be eternally grateful.
but the thing is, it's ULTRA expensive. It costs 998,000yen incl. tax!
I guess I will wait for some time until the price drops to 1/10 the price....
Well, if somebody is thinking of scaling up storage having unlimited budget with a 1400MB/s SSD that has a capacity of 300GB, it might be worth the cost.
Or if somebody is kind enough to donate me one of this, I will be eternally grateful.
2011年1月14日金曜日
2011年1月13日木曜日
http://headlines.yahoo.co.jp/hl?a=20110113-00000025-rbb-sci&utm_source=twitterfeed&utm_medium=twitter
NTT Communications has made Japan-USA backbone to 400Gbps.
Good news.
NTT Communications has made Japan-USA backbone to 400Gbps.
Good news.
2011年1月12日水曜日
Some Basics and Application You Should Know When Making a Search Engine.
Some basic keywords I will like to introduce to you:
Outbound Link
Inbound Link
Keyword Density
One simple algorithm I am sharing with you is the keyword density calculation. To put it simply, just calculate how many keywords there are, and get the percentage of the keyword you want to calculate against the whole set of keywords.
k/ak where k is keyword and ak is all keywords set.
Stop Word - You should check this out.
Score - Very important in ranking. I would say this is the most basic, but the hardest part of an search engine.
Definition of "score" according to SearchEngineDictionary.com
"Search engines usually arrange search results from the most relevant to the least relevant (as determined by the search engine's algorithm). In order to rank documents, the search engine assigns a score to each page and those with the highest scores are listed first. Most search engines simply give the maximum score to the most relevant document and score all other relevant documents relative to that document. Others compare all documents to a theoretically perfect document. The score of a web page therefore refers to its relevance as perceived by a specific search engine."
Definition of "scored keyword phrase" according to SearchEngineDictionary.com
"Name given to phrases that searchers use that are tracked by a system the records the number of times the phrase was used in a search, also known as the score."
One interesting technology I found is Lucandra, a Cassandra based Lucene backend.
Making a search engine will require months of testing, and improvement, it's almost like a never ending cycle. BUT, it is well worth it. Monetizing as a goal might be good, but it is important to make the search engine experience superb, and make it the main goal.
Enhancing lots of optional search methods or adding lots of database source (twitter) is a good idea.
Competition is harsh, too as small sized search engine attract only a tiny percentage of the whole search engine traffic. Entering the niche world is a very good idea.
I think starting from a different approach than a traditional search engine is crucial in achieving success in this world.
Outbound Link
Inbound Link
Keyword Density
One simple algorithm I am sharing with you is the keyword density calculation. To put it simply, just calculate how many keywords there are, and get the percentage of the keyword you want to calculate against the whole set of keywords.
k/ak where k is keyword and ak is all keywords set.
Stop Word - You should check this out.
Score - Very important in ranking. I would say this is the most basic, but the hardest part of an search engine.
Definition of "score" according to SearchEngineDictionary.com
"Search engines usually arrange search results from the most relevant to the least relevant (as determined by the search engine's algorithm). In order to rank documents, the search engine assigns a score to each page and those with the highest scores are listed first. Most search engines simply give the maximum score to the most relevant document and score all other relevant documents relative to that document. Others compare all documents to a theoretically perfect document. The score of a web page therefore refers to its relevance as perceived by a specific search engine."
Definition of "scored keyword phrase" according to SearchEngineDictionary.com
"Name given to phrases that searchers use that are tracked by a system the records the number of times the phrase was used in a search, also known as the score."
One interesting technology I found is Lucandra, a Cassandra based Lucene backend.
Making a search engine will require months of testing, and improvement, it's almost like a never ending cycle. BUT, it is well worth it. Monetizing as a goal might be good, but it is important to make the search engine experience superb, and make it the main goal.
Enhancing lots of optional search methods or adding lots of database source (twitter) is a good idea.
Competition is harsh, too as small sized search engine attract only a tiny percentage of the whole search engine traffic. Entering the niche world is a very good idea.
I think starting from a different approach than a traditional search engine is crucial in achieving success in this world.
2011年1月7日金曜日
2011年1月6日木曜日
Keywords I'm interested in.
Here are some keywords I am interested in learning.
MPI(MPICH-SCore,YAMPI)
SCASH
SCore
Note to self: interesting server node:https://supcom.hgc.jp/japanese/sys_const/000012.html
Super computer(with GPGPU) for rent http://itpro.nikkeibp.co.jp/article/NEWS/20101102/353730/
MPI(MPICH-SCore,YAMPI)
SCASH
SCore
Note to self: interesting server node:https://supcom.hgc.jp/japanese/sys_const/000012.html
Super computer(with GPGPU) for rent http://itpro.nikkeibp.co.jp/article/NEWS/20101102/353730/
2011年1月1日土曜日
インスピレーショナルな何か。
「運命は大胆なる者に味方する。」- デジデリウス エラスムス
もう、2011年ですね。
今年は去年よりもダイナミックに生きたいです。大胆に、元気に、また楽しく。
自作のサーバー機もうまく作れたし、Mohawkはあと少しでリリース出来ます。専用線引かないとな。。。
もう、2011年ですね。
今年は去年よりもダイナミックに生きたいです。大胆に、元気に、また楽しく。
自作のサーバー機もうまく作れたし、Mohawkはあと少しでリリース出来ます。専用線引かないとな。。。
登録:
投稿 (Atom)