« Manage virtualized sprawl with VRMs | Main | Paper: Scaling Online Social Networks without Pains »
Monday
Oct192009

Drupal's Scalability Makeover - You give up some control and you get back scalability

Drupal 7 is having a scalability makeover. Karoly Negyesi, Drupal Core Developer and Public Development Team Lead, explains the process in this video: Drupal 7 APIs, scalability mindset. Karoly states the general theme of the changes as: You give up some control and you get back scalability. An interesting comment on the politics of scalability?

Makeover may not be quite the right word though. A makeover implies a cosmetic change, looking better by changing the surface. Drupal's changes will go deeper than that, right to Drupal's core. It's a genuine and authentic change that will hopefully allow one of the Internet's most venerable Content Management Systems (CMSs) to compete with a constant stream of younger and sexier models.

Drupal is based on an older LAMP stack approach where PHP modules are scooped up and merged together each time a request is made to Drupal. Drupal's most intriguing idea is how it is built, expands, and changes by weaving together a single system out of individual components called modules. Built-in modules include comments, RSS, contact forms, forums, and Clean URLs. Add in modules include things like CSE to add Google's Custom Search Engine, modules to add in AdSense, CAPTCHA, and Sitemaps. Drupal establishes AOP extension points that allow modules to work remarkably well together, creating a site that feels like one single site even though it has been constructed from dozens of modules hunted and gathered from all over the digital world. 

The problem is the PHP code can directly access the database and directly render to the UI, there is little required layering. Part of Drupal's amazing configurability and extensibility has been how easy it is for everything to work together by changing the database. But when there's no layering it's almost impossible to optimize the system. If you have 20 different modules they each can make 20 separate calls to the database when what we really want is one call. And because of the direct SQL access when the number of writes increases there's no systematic way to distribute the writes across multiple servers. So we see as Drupal sites grow in the number of modules and the number of users both performance and scalability tank.

The younger models architect their systems differently. Sites like Google, Amazon, Facebook are written terms of an API and a framework, a service based approach. Using a service based approach the web tier can be programmed in terms of services that themselves are scalable so the entire system is scalable. When the API is skipped there are no leverage points that can be made to scale. It becomes a big ball of mud.

More layering and more APIs is exactly the direction Drupal is taking. Exactly how is Drupal changing?

  1. Forget SQL use APIs. Delegate control over what's happening to the API. This allows your site to scale. APIs in Drupal have historically been thought of as an inconvenience to be bypassed. You could just write a database query and dump something to the screen. Not with Drupal 7. The UI is seconary. Drupal 7 can be run without the UI because everything now is done through APIs. Previously some operations could only be done through the UI.
  2. New Database Layer. For modifications there's a new query builder that allows tricks to be done to enable writing more to the database.
  3. Queue. Queue API allows queueing jobs to be executed on a grid. For example, the aggregator in cron when handling several hundred feeds fails, it never finishes. In the new version the RSS feeds are put in the queue and processed when there's time. This type of asynchronous processing is at the heart of many of today's largest systems.
  4. Tests. Extensive unit tests are being developed to catch bugs. Previously testing was largely through UIs. Developers of new modules are encouraged to write tests. 

Will this work? Will this be enough? It's a promising start using best practices that have worked for other sites: queues, APIs, and abstraction layers. The move to unit testing is also smart. Given that Drupal sites are built from community contributions the new emphasis on unit tests should really help product quality going forward.

What Drupal has going against it is an incredible installed based of software that will be hard to upgrade to new ways of doing things. As Drupal user few things are more frustrating than the module upgrade dance. And since the coding practices for modules has changed so much it will be quite a challenge to get all those modules moved to the new way of doing things. Without these modules Drupal isn't as attractive an option.

I'm really hoping that Drupal works it out. The idea behind Drupal is compelling and unique. Making a single functional system from components is dream we still have not fully realized, but of everything out there Drupal comes the closest. Nature works on these principles too: composition, customization, growth through accretion. Parts keep being added on to existing systems rather than being thrown away and redesigned from scratch. In your brain you'll still find the brain of the lizard, mammal, and the primate. In your gut you'll find billions and billions of bacteria without which we could not process food. In Drupal we see a similar process happening in building software.

Compare this approach against all the widgets now available on the Internets. In comparison widgets are like impermanent tattoos. It's easy to embed widgets on your site precisely because they have nothing to do with your site. Their data is kept elsewhere. They don't integrate with your user and log-in system, your template system, your search system, your backup system, and they can't be composed together or work together. Drupal's modules can do all those things. Modules share the same templating system, the can work together, they can be configure in the UI, they can be searched and their data can be backed up. 

The great thing about Drupal is how easy it is to make a functional website. It's just been hard to make a great, well performing, and scalable website. Hopefully that will change.

Reader Comments (7)

In fact , i argue against the Drupal's approach , too much centralization will decrease the security . Clouding system makes more sense in anyway , doesn't it?

October 19, 2009 | Unregistered CommenterBilal

I started switching to Wordpress on smaller to mid-size projects because Drupal was a resource hog on shared servers. Wordpress is empowered by hundreds of plugins (ie modules) so would I be right in assuming that WP is using an API service-based framework?

October 22, 2009 | Unregistered CommenterRoss

Drupal and scalability in the same sentence!

Jokes apart, at least with Drupal 5.0 was a nightmare, my personal server could barely handle one user, all the modules needed created an incredible amount of mysql queries for every single page.

And when you read about huge sites using Drupal, they always rewrite big parts of it to make it perform much better, wonder why they chose Drupal.

October 24, 2009 | Unregistered CommenterLorenzo

At my workplace we use Drupal, but we have never used it "out of the box" as we have always wanted greater control over certain things. That having been said, we also make use of a lot of user contributed modules, there is some fantastic stuff out there, and why reinvent the wheel?

improvementandinnovation.com if anyone wants to see our Drupal site.

October 26, 2009 | Unregistered CommenterJamie

Lorenzo: That's not true. Maybe it was a long time ago but not anymore. The solution isn't spelt "forking Drupal" (that's what a 'rewrite' really is) but knowing how MySQL performs and applying cache solutions where needed. A common approach is to use Pressflow, a Drupal distribution that makes it easier to make Drupal scale by supporting CDNs and reverse proxy caching. Read a couple of case studies if you want to learn more about scaling Drupal. Following David Strauss blog in fourkitchens.com is also recommended.

October 26, 2009 | Unregistered CommenterJakob Persson

As someone who works closely with a couple of very large Drupal sites, the supposition that people who build very large sites with Drupal have to rewrite large parts of it is wrong. We keep as absolutely few core hacks as possible. It is true that there are a few patches out there that are often need to improve a couple of places that Drupal performs poorly, you absolutely do not need to rewrite parts of it.

What you do need to do is completely understand your architecture, know where the pain points are likely to be, and plan accordingly. It means that Drupal doesn't scale for novices, but it does scale for people who've spent significant time learning and understanding it.

October 26, 2009 | Unregistered Commentermerlinofchaos

+1 merlinofchaos.

There're number of well-documented and common core patches (e.g. the ones from PressFlow) as well as common techniques (using memcache, APC, optimizing your servers, using Solr for search, using a front-end reverse-proxy for caching), but it's simply not true to state that Drupal can not scale without "rewriting big parts".

Furthermore, if you compare scalability characteristics of Drupal to an average Java/J2EE-built website (commonly considered "enterprise-grade" technology), you will see huge gains out-of-the-box.

October 26, 2009 | Unregistered CommenterIrakli Nadareishvili

PostPost a New Comment

Enter your information below to add a new comment.
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>