Entries by HighScalability Team (1576)

Monday
Sep242012

Google Spanner's Most Surprising Revelation: NoSQL is Out and NewSQL is In

Google recently released a paper on Spanner, their planet enveloping tool for organizing the world’s monetizable information. Reading the Spanner paper I felt it had that chiseled in stone feel that all of Google’s best papers have. An instant classic. Jeff Dean foreshadowed Spanner’s humungousness as early as 2009.  Now Spanner seems fully online, just waiting to handle “millions of machines across hundreds of datacenters and trillions of database rows.” Wow.

The Wise have yet to weigh in on Spanner en masse. I look forward to more insightful commentary. There’s a lot to make sense of. What struck me most in the paper was a deeply buried section essentially describing Google’s motivation for shifting away from NoSQL and to NewSQL. The money quote:

We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.

This reads as ironic given Bigtable helped kickstart the NoSQL/eventual consistency/key-value revolution.

We see most of the criticisms leveled against NoSQL turned out to be problems for Google too. Only Google solved the problems in a typically Googlish way, through the fruitful melding of advanced theory and technology. The result: programmers get the real transactions, schemas, and query languages many crave along with the scalability and high availability they require.

The full quote:

Click to read more ...

Friday
Sep212012

Stuff The Internet Says On Scalability For September 21, 2012

It's HighScalability Time:

  • @5h15h: Walmart took 40years to get their data warehouse at 400 terabytes. Facebook probably generates that every 4 days 
  • Should your database failover automatically or wait for the guiding hands of a helpful human? Jeremy Zawodny in Handling Database Failover at Craigslist says Craigslist and Yahoo! handle failovers manually. Knowing when a failure has happened is so error prone it's better to put in a human breaker in the loop. Others think this could be a SLA buster as write requests can't be processed while the decision is being made. Main issue is knowing anything is true in a distributed system is hard.
  • Review of a paper about scalable things, MPI, and granularity. If you like to read informed critiques that begin with phrases like "this is simply not true" or "utter garbage" then you might find this post by Sébastien Boisvert to be entertaining.
  • The Big Switch: How We Rebuilt Wanelo from Scratch and Lived to Tell About It. Complete rewrites can work...sometimes. In two months they switched to RoR, PostgreSQL, and TDD away from Java/XML, MySQL, and cowboy. Good description of the hand off process. Sounds like a huge shot of redesign helped fix previous mistakes, which is the holy grail of rewrites. It's a story of hope...but don't all stories of temptation begin with hope too?
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Thursday
Sep202012

How Vimeo Saves 50% on EC2 by Playing a Smarter Game

Nothing shows how much software architectures have changed than the intelligent scheduling of computation over differently priced compute resources. This isn't just a false economy either. Vimeo saves up to 50% on their video transcoding bill by intelligently playing the spot, reserved, and on-demand markets. If you are ready for some advanced reindeer games then take a look at  Vimeo EC2 transcoding where they explain their thinking. Even if you don't like their rules, it's the strategy that matters. This presentation was from 2011, so it would be interesting to see if the new reserved instance market has made a difference in their strategy. 

Here's Vimeo's approach for minimizing costs using spot, reserved, and on-demand instances:

Click to read more ...

Tuesday
Sep182012

Sponsored Post: NY Times, CouchConf, Surge, FiftyThree, ROBLOX, Percona, ElasticHosts, Atlantic.Net, ScaleOut, New Relic, NetDNA, GigaSpaces, AiCache, Logic Monitor, AppDynamics, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

  • The New York Times is seeking a developer focused on infrastructure to join its newsroom development team. Read the full description here and send resumes to chadas@nytimes.com.
  • FiftyThree, the company behind the award-winning iPad app Paper, is looking for a {Backend || DevOps} Engineer to help us build our next great product: a service to "bring ideas together". http://www.fiftythree.com/jobs
  • New Relic is looking for a Java Scalability Engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter?  http://newrelic.com/about/jobs
  • Join the team at ROBLOX as a Senior Database Administrator and help us advance our rapidly growing gaming platform with over 30K web hits/sec, 75K+ database requests/sec, and over 1 petabyte of monthly CDN traffic. Sound cool? Apply here.

Fun and Informative Events

  • CouchConf is a one-day, three track event is for any developer who wants to take a 
  • deeper dive into Couchbase NoSQL technology, 
  • learn where it’s headed and build really cool stuff.
  • Surge - A Scalability & Performance Conference, presented by OmniT is happening on Sept. 27th - 28th. Special, High Scalability Reader Rate: 25% off registration--now through September 21!
  • Percona announces MySQL training for busy professional: Developer Training for MySQL. Percona is offering savings of over 35% for this course in the month of August.

Cool Products and Services

  • ElasticHosts launches white-label cloud reseller program offering 30% revenue share on fully rebranded cloud hosting.
  • Atlantic.Net with industry leading cloud servers backed by ultra-fast 40 Gigabits 4x Quad Rate Infiniband speeds, high throughput, low latency and newest RDMA technology. Free Trial Offer!
  • ScaleOut Software. In-memorry Data Grids for the Enterprise. Download a Free Trial.
  • Follow the Cloudify blog to learn more about our open source PaaS stack – latest integration recipes, builds, features, and other cool stuff.  Visit the GigaSpaces blog to learn how to take your application to the next level of scalability and performance.
  • NetDNA, a Tier-1 GlobalContent Delivery Network, offers a Dual-CDN strategy which allows companies to utilize a redundant infrastructure while leveraging the advantages of multiple CDNs to reduce costs.
  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free.  No sign-up required. http://aicache.com/deploy
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • CloudSigma. Utility style high performance cloud servers in the US and Europe delivered on all 10GigE networking. Run any OS, take advantage of SSD storage and tailored infrastructure options.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

For a longer description of each sponsor, please read more below...

Click to read more ...

Saturday
Sep152012

4 Reasons Facebook Dumped HTML5 and Went Native

Facebook made quite a splash when they released their native iOS app, not because of their app per se, but because of their conclusion that their biggest mistake was betting on HTML5, so they had to go native.

As you might imagine this was a bit like telling a Great White Shark that its bark is worse than its bite.  A common refrain was Facebook simply had made a bad HTML5 site, not that HTML5 itself is bad, as plenty of other vendors have made slick well performing mobile sites.

An interesting and relevant conversation given the rising butt kickery of mobile. But we were lacking details. Now we aren't. If you were wondering just why Facebook ditched HTML5, Tobie Langel in Perf Feedback - What's slowing down Mobile Facebook, lists out the reasons:

  1. Tooling / Developer APIs. Most importantly, the lack of tooling to track down memory problems. 
  2. Scrolling performance. Scrolling must be fast and smooth and full featured. It's not.
  3. GPU. A clunky API and black box approach make it an unreliable accelerator.
  4. Other. Would like better touch tracking support, smoother animations, and better caching.
  5. Click to read more ...

Friday
Sep142012

Stuff The Internet Says On Scalability For September 14, 2012

It's HighScalability Time:

  • Serves 4 billion hours of video each month, has 425M gmail users, and has 100PB of active data: Google;  340,000+ cores across 300 data centers to >10k scientists, archiving 15PB / yr: Open Science Grid
  • Quotable Quotes:
    • Chris Travers: MySQL is what you get when application developers build an RDBMS. PostgreSQL is what you get when database developers build an application development platform.
    • Hasen: Node.JS is a terrible platform. It’s terribleness stems from a very simple aspect of it, and this aspect happens to be central to how it works: callback-based I/O
    • @cgul: @github nooooooooo say it aint so. But I read all your articles on high scalability!
    • @brianfcoope: Can't we all just get along? MT @otrajman "Biggest problem with NoSQL guys is none of them know anything about databases..." - Stonebraker
  • Thank you CIO for including HighScalability as a top Cloud blog.
  • Listen to your mom, just because everyone is doing it doesn't mean you should too. Timo Zimmermann in  My Stack Is Bigger Than Yours - Ranting About Web Applications And Scalability says the same about frameworks. Don't go full stack, instead: keep your stack as small as possible; always keep scaling in mind; only scale when you need it; use what you know; "it works" is good enough most of the time.
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Sep122012

Using Varnish for Paywalls: Moving Logic to the Edge

This is a guest post from Per Buer, founder and CEO of Varnish Software, provider of Varnish Cache, an open source web application accelerator freely available at varnish-cache.org. Varnish powers a lot of really big websites worldwide.

We at Varnish Software are all about speed. Varnish Cache is built for speed. It executes its policy code more or less a thousand times faster than your typical Java or PHP based application servers, mostly due to the fact that the configuration is compiled into system call free machine code.

System calls require expensive context switches, stall the CPU and wreck havoc in the CPU cache so avoiding them makes the code fly. There are strong limitations on what kind of logic you can move into Varnish Cache, but the logic that you do move there will run very fast.

An example is using Varnish for access control to serve access controlled content from the caching edge layer.

The Varnish Paywall

Click to read more ...

Tuesday
Sep112012

How big is a Petabyte, Exabyte, Zettabyte, or a Yottabyte?

This is an intuitive look at large data sizes By Julian Bunn in Globally Interconnected Object Databases.

Bytes(8 bits)

Kilobyte (1000 bytes)

Click to read more ...

Monday
Sep102012

Russ’ 10 Ingredient Recipe for Making 1 Million TPS on $5K Hardware

My name is Russell Sullivan, I am the author of AlchemyDB: a highly flexible NoSQL/SQL/DocumentStore/GraphDB-datastore built on top of redis. I have spent the last several years trying to find a way to sanely house multiple datastore-genres under one roof while (almost paradoxically) pushing performance to its limits.

I recently joined the NoSQL company Aerospike (formerly Citrusleaf) with the goal of incrementally grafting AlchemyDB’s flexible data-modeling capabilities onto Aerospike’s high-velocity horizontally-scalable key-value data-fabric. We recently completed a peak-performance TPS optimization project: starting at 200K TPS, pushing to the recent community edition launch at 500K TPS, and finally arriving at our 2012 goal: 1M TPS on $5K hardware.

Getting to one million over-the-wire client-server database-requests per-second on a single machine costing $5K is a balance between trimming overhead on many axes and using a shared nothing architecture to isolate the paths taken by unique requests.

Even if you aren't building a database server the techniques described in this post might be interesting as they are not database server specific. They could be applied to a ftp server, a static web server, and even to a dynamic web server.

Here is my personal recipe for getting to this TPS per dollar...

Click to read more ...

Friday
Sep072012

Stuff The Internet Says On Scalability For September 7, 2012

It's HighScalability Time:

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...