Sunday
Feb242008
Yandex Architecture

Update: Anatomy of a crash in a new part of Yandex written in Django. Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it.
Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn:
Reader Comments (13)
Yandex is more than a search engine. It's a portal. I have a friend with a yandex.ru email address.
Wow.
I never heard of Yandex but they obviously do some serious business over there. it's always cool to see successful things that make it out of russia / eastern europe: The top devs over there are actually really excellent: very bright and ultra-hardcore.
Yes, Yandex isn't just a search engine, it also provides fairly long list of services, e-mail included.
But from my point of view, their services are not really provided on the competitive level of quality, especially comparing with Google services: sometimes i think that Yandex search engine chooses results not by relevancy, but according to random numbers generator, mailboxes are constantly flooded by spam and so on.
As for me, I prefer using international service providers even considering a fact that I live in Russia, but still Yandex remains the largest russian provider of internet-services and the really huge amount of russian people uses it just because of several strange reasons, such as for example they just "used to perform search in Yandex", or maybe that's too time-consuming to type google.com instead of ya.ru...
Google does not search well in russian as russian has many forms of words. So too make good search in russian you should know russian very well. Right now google searches in russian as if it was english.
+ May be Russia supports its search engines to control russian part of internet.
There are russian search engines yandex and ramber, russian money transfer webmoney and yander money, there is russian social networks and of course rutube.
There is also russian domains that you can't type in your browser as you do not have russian keyboard.
(Russian ):
http://www.seotools.ru/biblioteka-optimizatora/yandeks/arhitektura.html - Yandex architecture (2000)
http://www.searchrank.ru/arxitektura-yandeks-poiska/ (2007)
Slides from HighLoad 2007 conference (russian)
http://www.google.ru/url?sa=t&ct=res&cd=7&url=http%3A%2F%2Fwww.jug.ru%2Fservlets%2Fimages%2Fmeeting_2007_10_13%2Fyandex-search-arch-posthighload.ppt&ei=guy6R46gB6T8wwH88oHGCg&usg=AFQjCNF7f_AsyqvkjUGje6aPU0q-IGy-aA&sig2=vhyz3MV66M6p7Dgsk2SfCQ
Google does make stemming for cyrillic languages like Russian or Bulgarian (my native tongue). Perhaps Yandex does it better.
Damn, this captcha is case sensitive! Not a good choice, really!
Perhaps I'm picking a nit, but there are a couple of things off in the summary of the 'Anatomy of a crash' article.
They weren't writing to any magic variable. They were modifying the session data on every request (unnecessarily), which caused it to be saved to the DB. Nothing magic about that.
Lengthy index rebuilds were caused by the use of non-sequential primary keys (MD5 hashes) -- with InnoDB, that meant it would rebuild the whole thing on every request... thus poor performance.
To me it's magic because the consequences are hard to deduce from the code. And the choice of key values having such a tremendous negative impact is another bit of wild magic. Would one expect such an effect by looking at that line? Not me, which is why magic popped to mind.
Google does make stemming for cyrillic languages like Russian or Bulgarian (my native tongue). Perhaps Yandex does it better.
I always use Yandex when search in Russian due to the single reason - Yandex speaks Russian and Google does not.
However Google gets all of my English searches, cause Yandex just does not do a good job in "English Web". But hey - they have a big brother to learn from, so maybe one day... ;)
yandex looks really good and im sure it will come in usefull for me in the future
thanks
Pretty good summary about the Yandex. To be honest, i never knew such website exited before today.