Yandex Architecture
Sunday, February 24, 2008 at 1:53PM
Todd Hoff in C++, Django, Example, Java, Perl, freebsd, search
Update: Anatomy of a crash in a new part of Yandex written in Django. Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it.
Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn:
3.5 billion pages in the search index.
Over several thousand servers.
35 million searches a day.
Several data centers around Russia.
Two-layer architecture.
The database is split in pieces and when a search is requested, it pulls the bits from the different database servers and brings it together for the user.
Languages used: c++, perl, some java.
FreeBSD is used as their server OS.
$72 million in revenue in 2006.
Article originally appeared on (http://highscalability.com/).
See website for complete article licensing information.