Entries in search (7)

Monday
Oct112021

Scaling indexing and search - Algolia New Search Architecture Part 2

What would a totally new search engine architecture look like? Who better than Julien Lemoine, Co-founder & CTO of Algolia, to describe what the future of search will look like. This is the second article in a series. Here's Part 1.

Search engines need to support fast scaling for both Read and Write operations. Rapid scaling is essential in most use cases. For example, adding a vendor in a marketplace generates a spike of indexing operations (Write), and a marketing campaign generates a spike of queries (Read). In most use cases, both Read and Write operations scale but not at the exact same moment. The architecture needs to handle efficiently all these situations as the scaling of Read and Write operations varies over time in most use cases.

Until now, search engines were scaling with Read and Write operations colocated on the same VMs. This scaling method brings drawbacks, such asWrite operations unnecessarily hurting the Read performance and using a significant amount of duplicated CPU at indexing. This article explains those drawbacks and introduces a new way to scale more quickly and efficiently by splitting Read and Write operations.

1. Anatomy of an index

Click to read more ...

Monday
Aug022021

Evolution of search engines architecture - Algolia New Search Architecture Part 1

 

What would a totally new search engine architecture look like? Who better than Julien Lemoine, Co-founder & CTO of Algolia, to describe what the future of search will look like. This is the first article in a series.


Search engines, and more generally, information retrieval systems, play a central role in almost all of today’s technical stacks. Information retrieval started in the beginning of computer science. Research accelerated in the early 90s with the introduction of the Text REtrieval Conference (TREC). After more than 30 years of evolution since TREC, search engines continue to grow and evolve, leading to new challenges.

In this article, we look at some key milestones in the evolution of search engine architecture. We also describe the challenges those architectures face today. As you’ll see, we grouped the engines into four architecture categories. This is a simplification, as there are in reality a lot of different engines with various mix of architectures. We did this to focus our attention on the most important characteristics of those architectures.

1. The Inverted Index —  the early days of search

Click to read more ...

Thursday
Nov262009

Kngine Snippet Search New Indexing Technology

While Kngine just announce some improvement and new features, I would like you take you in small trip in Snippet Search research project at Kngine.

 

Click to read more ...

Friday
May012009

FastBit: An Efficient Compressed Bitmap Index Technology

Data mining and fast queries are always in that bin of hard to do things where doing something smarter can yield big results. Bloom Filters are one such do it smarter strategy, compressed bitmap indexes are another. In one application "FastBit outruns other search indexes by a factor of 10 to 100 and doesn’t require much more room than the original data size." The data size is an interesting metric. Our old standard b-trees can be two to four times larger than the original data. In a test searching an Enron email database FastBit outran MySQL by 10 to 1,000 times.

FastBit is a software tool for searching large read-only datasets. It organizes user data in a column-oriented structure which is efficient for on-line analytical processing (OLAP), and utilizes compressed bitmap indices to further speed up query processing. Analyses have proven the compressed bitmap index used in FastBit to be theoretically optimal for one-dimensional queries. Compared with other optimal indexing methods, bitmap indices are superior because they can be efficiently combined to answer multi-dimensional queries whereas other optimal methods can not.
It's not all just map-reduce and add more servers until your attic is full.

Related Articles

  • FastBit: Digging through databases faster. An excellent description of how FastBit works, especially compared to b-trees.

    Click to read more ...

  • Sunday
    Jun082008

    Search fast in million rows

    I have a table .This table has many columns but search performed based on 1 columns ,this table can have more than million rows. The data in these columns is something like funny,new york,hollywood User can search with parameters as funny hollywood .I need to take this 2 words and then search on column whether that column contain this words and how many times .It is not possible to index here .If the results return say 1200 results then without comparing each and every column i can't determine no of results.I need to compare for each and every column.This query is very frequent .How can i approach for this problem.What type of architecture,tools is helpful. I just know that this can be accomplished with distributed system but how can i make this system. I also see in this website that LinkedIn uses Lucene for search .Is Lucene is helpful in my case.My table has also lots of insertion ,however updation in not very frequent.

    Click to read more ...

    Wednesday
    May282008

    Job queue and search engine

    Hi, I want to implement a search engine with lucene. To be scalable, I would like to execute search jobs asynchronously (with a job queuing system). But i don't know if it is a good design... Why ? Search results can be large ! (eg: 100+ pages with 25 documents per page) With asynchronous sytem, I need to store results for each search job. I can set a short expiration time (~5 min) for each search result, but it's still large. What do you think about it ? Which design would you use for that ? Thanks Mat

    Click to read more ...

    Sunday
    Feb242008

    Yandex Architecture

    Update: Anatomy of a crash in a new part of Yandex written in Django. Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it. Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn:

  • 3.5 billion pages in the search index.
  • Over several thousand servers.
  • 35 million searches a day.
  • Several data centers around Russia.
  • Two-layer architecture.
  • The database is split in pieces and when a search is requested, it pulls the bits from the different database servers and brings it together for the user.
  • Languages used: c++, perl, some java.
  • FreeBSD is used as their server OS.
  • $72 million in revenue in 2006.

    Click to read more ...