« FaceStat's Rousing Tale of Scaling Woe and Wisdom Won | Main | GigaOm Structure 08 Conference on June 25th in San Francisco »
Sunday
Jun082008

Search fast in million rows

I have a table .This table has many columns but search performed based on 1 columns ,this table can have more than million rows.
The data in these columns is something like funny,new york,hollywood
User can search with parameters as funny hollywood .I need to take this 2 words and then search on column whether that column contain this words and how many times .It is not possible to index here .If the results return say 1200 results then without comparing each and every column i can't determine no of results.I need to compare for each and every column.This query is very frequent .How can i approach for this problem.What type of architecture,tools is helpful.
I just know that this can be accomplished with distributed system but how can i make this system. I also see in this website that LinkedIn uses Lucene for search .Is Lucene is helpful in my case.My table has also lots of insertion ,however updation in not very frequent.

Reader Comments (6)

I don't understand why you cannot use an index. To be blunt, without the use of indexes you cannot search through millions of rows quickly. How do you expect a database engine to do that unless everything is laid out in a structured logical order i.e. accessed using indexes?
If funny or hollywood is a category, then you would normally represent these as categoryid's from a related lookup table. Then you would search the database with the where predicate based on a selected categoryid. With a clustered index on this column this would normally be very fast. In Sql Server 2005, I have a huge indexed view with a similar amount of columns and perform such queries in under a second. Without the use of indexes on the source tables and the view however, the query time can go up to 10 seconds! You cannot query millions of rows without indexes and expect high performance, especially if you have anything other than narrow tables containing only ints.

December 31, 1999 | Unregistered CommenterAnonymous

Yes, this would be a good application of an inverted index like Lucene.

December 31, 1999 | Unregistered Commenterquellish

Thanks for your reply
In column ,data is like "funny ;hollywood ;fast ;youtube;".
For every column i need to compare each and every word contained in that column .If i use substring kind of function then in that case database don't use index.I want to store tags in column and use it for searching.

December 31, 1999 | Unregistered Commenterd17may

you can use "innodb" for mysql or oracle to billion rows but you need some information to use or download from web site innodb

December 31, 1999 | Unregistered Commenterبرامج

Is there any other solution like Lucene or Hadoop.I heard Lucene is very good in searching but not sure about its scalability.Is we can use Hadoop for online search or it is only suitable for indexing our data offline.

December 31, 1999 | Unregistered Commenterd17may

Check out Sphinx search. I think if you can find a way of pushing your keyword values from that column into attribute fields you'll be able to get it to do what you want.

http://www.sphinxsearch.com/

December 31, 1999 | Unregistered CommenterJames

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>