Entries in hibernate (5)

Saturday
Jul262008

Sharding the Hibernate Way

Update: A very nice JavaWorld podcast interview with Google engineer Max Ross on Hibernate Shards. Max defines Hibernate Shards (horizontal partitioning), how it works (pretty well), virtual shards (don't ask), what they need to do in the future (query, replication, operational tools), and how it relates to Google AppEngine (not much). To scale you are supposed to partition your data. Sounds good, but how do you do it? When you actually sit down to work out all the details it’s not that easy. Hibernate Shards to the rescue! Hibernate shards is: an extension to the core Hibernate product that adds facilities for horizontal partitioning. If you know the core Hibernate API you know the shards API. No learning curve at all. Here is what a few members of the core group had to say about the Hibernate Shards open source project. Although there are some limitations, from the sound of it they are doing useful stuff in the right way and it’s very much worth looking at, especially if you use Hibernate or some other ORM layer.

Information Sources

  • Google Developer Podcast Episode Six: The Hibernate Shards Open Source Project. This is the document summarized here.
  • Hibernate Shards Project Page
  • Hibernate Shards Dev Discussion Group.
  • Ryan Barrett’s Scaling on the Cheap presentation. Many of the lessons from here are in Hibernate Shards.
  • JavaWorld podcast interview: Sharding with Max Ross - Hibernate Shards - Max Ross is the Google engineer who spends his days working on the Google App Engine data store. On the side he works on Hibernate Shards, another scalability-obsessed project that is open source.

    What is Hibernate Shards?

  • Shard: splitting up data sets. If data doesn't fit on one machine then split it up into pieces, each piece is called a shard.
  • Sharding: the process of splitting up data. For example, putting employees 1-10,000 on shard1 and employees 10,001-20,000 on shard2.
  • Sharding is used when you have too much data to fit in one single relational database. If your database has a JDBC adapter that means Hibernate can talk to it and if Hibernate can talk to it that means Hibernate Shards can talk to it.
  • Most people don't want to shard because it makes everything complex. But when you have too much data, when you fill your database up, you need another solution, which can be to shard the data across multiple relational databases. The complexity arises because your application has to have the smarts to access multiple databases and that's where Hibernate Shards tries to help.
  • Structure of the data is identical from server to server. The same schema is used across all databases (MySQL, etc).
  • Hibernate was chosen because it's a good ORM tool used internally at Google, but to Google Scale (really really big), sharding needed to be added because Hibernate didn’t support that sort of scale out of the box.
  • The learning curve for a Hibernate user is zero because the Hibernate API is the same. The shard implementation hasn’t violated the API (yet). Sharded versions of Session, Critieria, and Factory are available so the programmer doesn't need to change code. Query isn't implemented yet because features like aggregation and grouping are very difficult to implement across databases.
  • How does it compare to MySQL's horizontal partitioning? Shards is for situations where you have too much data to fit in a single database. MySQL partitioning may allow you to delay when you need to shard, but it is still a single database and you’ll eventually run into limits.

    Schema Design for Shards

  • When sharding you have to consider the general issues of distributed data design for high data volumes. These aren’t Hibernate Shards specific issues, but are general to the problem space.
  • Schema design is the most important of the sharding process and you’ll have to do that up front.
  • You need to pick a dimension, a root level entity, that is easily sharded. Users and customers are common examples.
  • Accept the fact that those entities and all the entities that hang off those entities will be stored in separate physical spaces. Querying across different shards will be difficult. As will management and just about anything else you take for granted.
  • Control over how data are distributed is determined by a pluggable strategies layer.
  • Plan for the future by picking a strategy that will last you a long time. Repartitioning/resharding the data is operationally very difficult. No management tools for this yet.
  • Build simpler models that don't contain as many relationships because you don't have cross shard relationships. Your objects graphs should be contained on one shard as much as possible.
  • Lots of lots of objects pointing to each other may not be a good candidate for sharding.
  • Because the shards design doesn’t modify Hibernate core, you can design using shards from the start, even though you only have one database. Then when you need to start scaling it will be easier to grow.
  • Existing systems with shardable tables shouldn’t take very long to get up and running.
  • Policy decisions can drive sharding. For example, let's say customers don't want their data intermingling, so each customer would get their own database. In this case the application would shard on the customer as a matter of policy, not simply scaling concerns.

    The Sharding Code’s Relationship to Hibernate

  • Hibernate Shards encapsulates knowledge of all the shards. This knowledge is not in the database or the application. It's at the Hibernate persistence layer which provides a unified view of all the databases so the application doesn't have to know.
  • Shards doesn't have full support for Hibernate’s query interface. Hibernate has a criteria or a query interface. Criteria interface is robust, but not good for JPA (Java persistence API), which is query based.
  • Sharding should work across all databases Hibernate works on since shards is a layer on top of Hibernate core beneath the standard Hibernate interfaces. Programmers aren’t aware of it.
  • What they are doing is figuring out how to do standard things like save objects, update, and query objects across multiple databases using standard Hibernate interfaces. If Hibernate can talk to it they can talk to it.
  • A sharded session is used to contain Hibernate’s sessions so Hibernate capabilities are preserved.
  • Can not manage cross shard foreign relationships (yet). Do have runtime checks to detect when cross shard relations are used accidentally. No foreign key constraint checking and there’s no Hibernate lazy loading. From a programming perspective you can have IDs that reference other objects on other shards, it’s just that Hibernate won’t know about these relationships.
  • Now that the base software is done these more advanced features can be considered. It may take changes in Hibernate core

    Pluggable Strategies Determine How Data Are Split Across Shards

  • A Strategy dictates how data are spread across the shards. It’s an interface you need to implement. There are three Strategies: * Shard Resolution Strategy - how you will retrieve your objects. * Shard Selection Strategy – define where objects are saved to. * Access Strategy – once you figure out which shard you are talking to, how do you want to access those shards (serially, 2 at a time, in parallel, etc)?
  • Goal is to have Strategies as flexible as possible so you can decide how your data are sharded.
  • A couple of implementations are provided out of the box: * Round Robin - First one goes to the first shard, second to the second shard, and then it loops back. * Attribute Based – Look at attributes in the data to determine which shard. You can shard users by country, for example.
  • Configuration is set by creating a prototype configuration for all shards (remember, same schema). Then you specify what's different from shard to shard like URL, user name and password, dialect (MySQL, Postgres, etc). Then they'll create a sharded session factory for Hibernate so developers use standard interfaces.

    Some Limitations

  • Full Hibernate HQL is not yet supported (maybe it is now, but I couldn’t tell).
  • Distributed queries are handled by applying a standard HQL query to each shard, merging the results, and applying the filters. This all happens in the application server so using very large data sets could be a problem. It’s left to the intelligence of the developers to do the right thing to manage performance.
  • No mirroring or data replication. Replication is having common tables, like zip codes, available on all shards.
  • No clean way to manage read only data you want on every shard for performance and referential integrity reasons. Say you have country data. It makes sense to replicate that data on each shard so all queries using that data can stay on the shard.
  • No handling of fail over situations, which is just like Hibernate. You could handle it in your connection pool or some other layer. It’s not considered part of the shard/OR mapping layer.
  • There’s a need for management tools that work across shards. For example, repartition data on a live system.
  • It’s possible to shard across different databases as long as you keep the same schema in the same in each database.
  • The number of shards you can have is somewhat limited because each shard is backed by a connection pool which is a lot of databases connections. And ORDER_BY operations across databases must be done in memory so a lot of memory could be used on large data sets.

    Related Articles

  • An Unorthodox Approach to Database Design: The Coming of the Shard.

    Click to read more ...

  • Saturday
    Feb022008

    The case against ORM Frameworks in High Scalability Architectures

    Let me begin by saying that I have used and continue to use various ORM frameworks such as hibernate, ibatis, propel and activerecord in applications and websites that have a user base ranging from a couple hundred to 500k users. Especially for projects that have to be up and running in a short duration of time, ORM frameworks significantly reduce the effort required to manipulate and persist OOP objects by providing time saving facilities such as automatically generated model objects, integrated unit testing, secure variable substitution, etc. Hibernate even supports horizontal data partitioning via Hibernate Shards. However, the lay of the land is significantly different in the rarefied space occupied by applications needing to support millions of users. Profiling an application at this level and paying particular attention to the operations needed to move data to and from the database, it becomes evident that a significant portion of the operations are API related, whereby the ORM framework is traversing the abstraction layer built between the application logic and the native methods that ultimately interact with the database. I see a couple of problems with this level of abstraction and for the purpose of this discussion, I will purposely ignore caching for the sake of keeping the scope succinct. 1. The process of optimizing database queries is as much an art as it is a science and I am yet to see an ORM framework that does this well. In the case of mysql, optimization involves using facilities such as explain, benchmark, analyze table, show index, and the slow queries log to identify non-performing queries and tweak them to extract the leanest performance. These optimizations necessarily work best when applied as close as possible to the bare metal, so to speak, and the abstraction of an ORM framework negates to an extent the benefits of optimization. The devil remains in the details and the further away you are from the details, the lesser a chance you have to find and square with the devil. 2. At the end of the day, an ORM framework is essentially middleware. My reading of some of the real life architectures presented on this sites seems to reinforce the assessment that middleware will only take you so far, beyond which you have to roll your own. This makes perfect sense. ORM frameworks are built to serve as wide an audience as possible and while their success is unquestionable in the commodity/middle market, they are not and cannot possibly be tooled to accommodate the atypical demands of high scalability architecture. That would be akin to running with hares and hunting with the hounds. Building a framework for hight scalability would also require that the builders have a front and center seat in an enterprise where they are exposed to the machinery and day to day operations of a high scalability site. A situation for which you would be hard pressed to find another installation bearing similar characteristics or with similar requirements. Additionally, and without putting down the developers who contribute to these frameworks, a majority of them would not have the exposure to a bona fide high scalability architecture to be able to bring their experience to bear on the framework code base. 3. Just as with kernel developers, I have a significant amount of faith in the folks that spend their every waking hour coding database engines such as MySQL, Postgres, Oracle, MS SQL etc. Consequently, when the main goal is ultimate performance and scalability, I generally frown upon efforts to introduce a middle man between the wicked fast database and the application logic. And having invested the time and effort over many years to learn the intricacies of a database engine, I am more apt to cast my lot with the devil that I know than abdicate control to a framework, however versatile. One could argue that it makes sense to start off with an ORM framework and as the demands for the site begin to eclipse what the framework can provide, gradually transition to a custom built solution. In my experience, refactoring on the database tier for a site that has a significant amount of data and needs to be operational 24x7 is pure hell. So much so that a more feasible option would be to build a parallel site then migrate and switch over. Of course this could be mitigated by using a service oriented architecture and thereby giving yourself some degree of maneuverability, but at the end of the day, there will be thousands of operations trying to read and write to the db every second. You are had, whichever which way you turn. Taking a look at the mediawiki source code that powers the Wikimedia sites including Wikipedia, there are two classes, DatabaseMySQL and DatabasePostgress which encapsulate the native PHP functions that talk to MySQL or PostgreSQL respectively. The other main classes such as the Article class then use these database classes to interact with the db. Simple and straight forward and in my opinion, the best way to get maximum performance and throughput.

    Click to read more ...

    Thursday
    Nov152007

    Lessons from Yahoo, eBay, Orbitz, LinkedIn architecture

    In The Architectures You've Always Wondered About track at the Qcon conference, Second Life, eBay, Yahoo, LinkedIn and Orbitz presented how they dealt with different aspects of their applications, such as scalability. There were quite a few lessons that I learned that day that I thought were worth sharing. The details are provided below: Lessons from Yahoo, eBay, Orbitz, LinkedIn architecture

    Click to read more ...

    Tuesday
    Oct022007

    Secrets to Fotolog's Scaling Success

    Fotolog, a social blogging site centered around photos, grew from about 300 thousand users in 2004 to over 11 million users in 2007. Though they initially experienced the inevitable pains of rapid growth, they overcame their problems and now manage over 300 million photos and 800,000 new photos are added each day. Generating all that fabulous content are 20 million unique monthly visitors and a volunteer army of 30,000 new users each day. They did so well a very impressed suitor bought them out for a cool $90 million. That's scale meets success by anyone standards. How did they do it? Site: http://www.fotolog.com/

    Information Sources

  • Scaling the World's Largest Photo Blogging Community
  • Congrats to Fotolog on $90mm sale to Hi-Media
  • Fotolog overtaking Flickr?
  • Fotolog Hits 11 Million Members and 300 Million Photos Posted
  • Site of the Week: Fotolog.com by PC Magazine
  • CEO John Borthwick's Blog.
  • DBA Frank Mash's Blog
  • Fotolog, lessons learnt by John Borthwick .

    The Platform

  • Java
  • PHP
  • Sun
  • Solaris 10
  • MySQL
  • Apache
  • Hibernate
  • Memcached
  • 3PAR (a simple, efficient and scalable tiered-storage array for utility computing)
  • IBRIX (a single namespace parallel file system, a scalable volume manager, high availability feature)
  • StrongMail
  • CDN: Akamai/Panther

    The Stats

  • Started in 2002. In 2004 they had around 300k or 400k members, 3 employees, no scalable infrastructure, and no revenue model.
  • Due to the rapid growth the site had frequent technical problems and 2005 they had to limit new free members to 1,000 a day.
  • In 2007 they had over 11 million users and were sold for $90 million to Hi-Media.
  • Members are from over 200 countries with a majority in South America. Over 20% of page views are from Europe. They rejected a US centric strategy, developing a global and engaged audience.
  • Generates over 3.5 billion page views and receives over 20 million unique visitors each month and has earned a top 20 Alexa ranking.
  • Manages over 300 Million photos and over 500,000 photos are uploaded each day.
  • Over 30,000 new members are added each day and attracts more than 4.6 million daily users. Expanded with no marketing or member incentives.
  • Over 500 user-generated communities.
  • 20% of member visit the site daily and spend an average of 24 minutes.
  • 32 MySQL servers and a 30 memcached server cluster.

    The Architecture

  • Site originally written in PHP. - Their new "Fotolog memberpage" feature is written in Java with significant performance improvement. Page is cleaner with an improved response time. - They are now serving the site on less than half the boxes they were using. - Daily registrations are up over 35% given the improved performance and a requirement to register to post a guest book message. - The new code base allows them to innovate much more on the member experience.
  • They have surpassed Flickr in popularity being a firmly Web 1.0 application. - There are no tags, no APIs, no JavaScript widgets, no Ajax. - They have a Spanish language option which extends the site to a broad user base. - They use very little text. It's mostly visual so it usable by a broad range of users. - Their interface is customizable and many people like to express their individual identities. - Their unique visitors are 1MM less than Yahoo's, yet the total minutes on the site are twice that of Yahoo and pages are 3x.
  • Revenue model: - Gold camera member for about $5/month means you can upload 6 photos a day instead of 1, have 200 comments per photo instead of 20, a custom title image for your profile, a mini-thumbnail of your most recent photo displayed next to your name in guest books, plus the possibility of having your photo featured on the front page.. - Adsense. Revenue lift from Google is trending up approximately 15% given additional contextual data from guest books. - Will move to a peer-to-peer advertising among their members. - Members will have the ability to buy and sell real and virtual items using a micro-payment service.
  • They have a one-post-per-day rule where users can only post one photo a day. Rather than inhibit growth this rules ensures quality and generates exceptional usage by increasing the chance of a photo being seen and by attracting positive comments. Where as people usually run out of things to say on a blog, people can always find a picture to take, upload, and talk about.
  • Only photos less than 2,000 kb in size can be uploaded. These are automatically resized to a 500x500 format. Pages look cleaner and load faster.
  • Model is browsing over searching. Opportunistic serendipitous treasure hunting is encouraged.
  • Friends are added automatically without needing permission. This generates an audience for your photos.
  • Supports a browse by groups feature, which have categories like "Colors" and "Emotions."
  • The site is intentionally simple. - They have resisted the temptation to add feature after feature. Instead their vision is to offer a handful of features, similar to Craig's list, the focus being on content and the conversations. - Pages need to be social. - Pages need to include not only your images, but also images from across the network, providing a visual navigation that today drives much of the time their members spend on the site, a self formed, organic distribution system, letting members see and be seen. - Complementing this social network of images are comments and guest book entries — making the experience one where media intersects with communications, day in day out, millions of images collide with billions of conversations.
  • Photobucket vs Fotolog - Photobucket stores image-based media, then distributes it to your page on social networking sites such as Myspace, Bebo, Piczo, Friendster, etc. - Fotolog is a destination. - The first generation of social-networking sites stressed self-publishing over connections (from Geocities, to Tripod to Blogger). The next generation focused mostly on connections (sixdegrees, and friendster are the classic examples here — tools to gather friends and connections, as social capital accrues in theory to the people with the most connections). The third and current generation of sites blends media with connections — each with a different emphasis.
  • Backup: Sun 6540 disk array
  • Their 32 SQL servers are divided into four clusters - user, GB (guest book), PH (photos), FF (friends and favorites lists) - Uses non-persistent connections. - Connection pooling on the Java side. - InnoDB - Partitioning is handled by the application layer.
  • Each cluster: - Is fronted by a set of application servers. - Divided into a set of shards. - Each shard has MySQL write-only master-master configuration feeding a few read-only slaves. - Application servers send their read requests to the slaves and their write requests to the masters. - Data are assigned to shared based on some sort of cluster specific partioning key. Naive partitioning algorithms can lead to very uneven shard loads, you want a more balanced load on each shard.
  • MySQL is used to store image metadata only. This seems pretty standard. Almost nobody seems to store important blobs in the database because it slows down database operations.
  • Photo storage uses 3PAR and IBRIX. A CDN is used for hot content.
  • The virtual storage system, though expensive, has worked very well.
  • As more selects are used lock contention for auto-incremented keys grows.
  • Through database optimizations they've been able to grow from 4 million members to 11 million members on the same 32 database servers. This is also do to the efficiency of MySQL running on Solaris 10, and increasing the memcache cluster, porting to Java, and increasing RAM.
  • Happy with memcached. - Created a distributed cluster of 50 memcached servers with a total cache size of approximately 150 gigabytes, supporting around 4 billion page views/month. Peak load times dropped from 10 seconds to 2 seconds. - Quote from CTO:
    I have a new memcached user to add to your list: we here at Fotolog, the world's largest photo blogging community, now use it and we love it. I just rolled our first code to use it into production today and it has been a lifesaver. I can't wait to start using it in places where we had been relying on Berkeley databases to offload some database work. We are not some wimpy million page a day site, either. Fotolog is a billion+ pages/month site (35 to 40 million views/day is pretty typical for us). We had recently overcome some significant DB-related performance issues which allowed our site traffic to explode, and it started to bog down again under the heavy traffic load (getting back up towards 10 seconds for a page to load sometimes during the peak periods). The servers were churning away each recreating a list every time when it could easily be shared in the same form for at least 5 or 10 minutes. So we introduced memcache, creating a distributed 30-server cluster with 4 gigs available in total and made a very minor code mod to use memcache, and our peak period load times dropped back down to the 2 second or so range. It has allowed for continued growth and incredible efficiency. I can't say when I've ever been so pleased with something that worked so simply."

    Lessons Learned

  • Popularity is driven by a base of active users, not a rich set of cool features.
  • The web is global and its tail is very long. By courting users outside the US with language and culturally specific design you can compete with the big boys. Some the hardest competition for Google, Yahoo, etc comes from local startups with an ear to what the locals want.
  • If you want to get a lot of buzz then do what ever alpha geeks want you to do. If you want a lot of happy users do what they want you to do.
  • Constraints in web sites can, like in poetry, make something unexpectedly better. The rule that users are only allowed to post one photo per day creates an environment where people comment more on each others photos which creates a more engaged community. Who knew?
  • Protect your website with limits. Limit the size of pictures, comments, etc so your resource usage doesn't grow outrageously.
  • Have a vision. Have a strong sense of what your site is supposed to be and why, then use that vision to decide what you should build and how you should build it. Their vision of social site built around daily photographs led to a very different site than one where your goal is to store all your photos.
  • Revenue generation features can be added without destroying the integrity of your site. I really like how they give people a reasonable set of features for free and then charge for the resources they need to have more. Those features also serve to extend and reinforce the social vision of their site. It will be interesting to see how their new monetization strategies play out.
  • Don't be afraid to scale up and out. By adding more cache, more RAM, more CPUs, and more efficient CPUs you handle dramatically more load with the same number of machines. And that's a good thing from a datacenter space and power POV.
  • Making MySQL perform: - Find the source of the problem. - Mature systems are mostly disk bound. - The query cache may be hurting you. - Add RAM to help dodge the bullet. - Stripe your disks. - Restructure tables for optimal performance. - Use libumem.so to find memory leaks.
  • Things to remember: - Know the problem - Know your application - Know your storage engine - Know your requirements - Know your budget - Use all this information to decide what parts of your system really require the investment of time, money, and testing to be highly available.

    Related Articles

  • Flickr Architecture
  • An Unorthodox Approach to Database Design : The Coming of the Shard

    Click to read more ...

  • Tuesday
    Jul242007

    Product: Hibernate Shards

    If you want to adopt a shard architecture, but don't want to start from scratch, you may want to consider Hibernate's sharding system. Hibernate Shards is a framework that is designed to encapsulate and minimize this complexity by adding support for horizontal partitioning to Hibernate Core. Hibernate Shards key features: * Standard Hibernate programming model - Hibernate Shards allows you to continue using the Hibernate APIs you know and love: SessionFactory, Session, Criteria, Query. If you already know how to use Hibernate, you already know how to use Hibernate Shards. * Flexible sharding strategies - Distribute data across your shards any way you want. Use one of the default strategies we provide or plug in your own application-specific logic. * Support for virtual shards - Think your sharding strategy is never going to change? Think again. Adding new shards and redistributing your data is one of the toughest operational challenges you will face once you've deployed your shard-aware application. Hibernate Sharding supports virtual shards, a feature designed to simplify the process of resharding your data. * Free/open source - Hibernate Shards is licensed under the LGPL (Lesser GNU Public License)

    Click to read more ...