Entries by HighScalability Team (1576)

Thursday
Apr252013

Paper: Making reliable distributed systems in the presence of software errors

Joe Armstrong is a co-inventor of Erlang and general all around renaissance software tinkerer as shown by his excellent work on writing a C Compiler and his voluminous work on GitHub.

Given the success of Erlang it's probably no surprise that he wrote his thesis on the ground breaking ideas behind Erlang: Making reliable distributed systems in the presence of software errors.

Even if you have yet to join the cult of Erlang the principles behind Erlang are universal and well worth exploring for your own designs. Highly recommended.

Introduction:

Click to read more ...

Wednesday
Apr242013

Strategy: Using Lots of RAM Often Cheaper than Using a Hadoop Cluster

Solving problems while saving money is always a problem. In Nobody ever got fired for using Hadoop on a cluster they give some counter-intuitive advice by showing a big-memory server may  provide better performance per dollar than a cluster:

  1. For jobs where the input data is multi-terabyte or larger a Hadoop cluster is the right solution.
  2. For smaller problems memory has reached a GB/$ ratio where it is technically and financially feasible to use a single server with 100s of GB of DRAM rather than a cluster. Given the majority of analytics jobs do not process huge data sets, a cluster doesn't need to be your first option. Scaling up RAM saves on programmer time, reduces programmer effort, improved accuracy, and reduces hardware costs.

 

Tuesday
Apr232013

Facebook Secrets of Web Performance

This is a repost of part 1 of an interview I did for the Boundary blog.

Boundary: What is Facebook’s secret sauce for managing what’s got to be the biggest Big Data project, if you will, on the Web?

Hoff: From several presentations we’ve learned what Facebook insiders like Aditya Agarwal and Robert Johnson, both former Directors of Engineering, consider their secret sauce:

Click to read more ...

Friday
Apr192013

Stuff The Internet Says On Scalability For April 19, 2013

Hey, it's HighScalability time:


(Ukrainian daredevil scaling buildings)
  • Two Trillion Objects, 1.1 Million Requests / Second: S3; 1.4TB/s: Titan supercomputer has world’s fastest storage; four billion hours: Netflix streaming in last 3 months; $1.2B: Google's Q1 infrastructure spend
  • Quotable Quotes:
    • Google: We'll track EVERY task on EVERY data center server
    • Stacey Higginbotham: All in all in the last five years the world has gained 54 Tbps of new capacity.
    • @seveas: Scalability 103: Hardware sucks. Software sucks. Everything *will* break, prepare for failure of any component of your system.
    • bloodredsun: The long and short of it is that Cassandra is a fantastic system for write heavy situations. What it is not good at are read heavy situations where deterministic low latency is required, which is pretty much what the pinterest guys were dealing with.
    • @viktorklang: "The e-mail message could not be delivered because the user's mailfolder is full." <-- EMAIL HAS BACKPRESSURE OMG
  • Interesting Behind the Scenes: Airbnb Neighborhoods. Includes a description of their work flow and a detailed breakdown of their stack: Rails, PostgreSQL/PostGIS, Memcached, CoffeeScript, Sass, jQuery, Handlebars, Backbone, Underscore, Sinatra, Clojure, Java, Hadoop, Cascalog. Highlight: "You don't need a database, you need a [expletive deleted] cache" So that's what we did, we traded our database for a cache.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Apr172013

Tachyon - Fault Tolerant Distributed File System with 300 Times Higher Throughput than HDFS

Tachyon  (github) is interesting new filesystem brought to by the folks at the UC Berkeley AMP Lab:

Tachyon is a fault tolerant distributed file system enabling reliable file sharing at memory-speed across cluster frameworks, such as Spark and MapReduce.It offers up to 300 times higher throughput than HDFS, by leveraging lineage information and using memory aggressively. Tachyon caches working set files in memory, and enables different jobs/queries and frameworks to access cached files at memory speed. Thus, Tachyon avoids going to disk to load datasets that is frequently read.
It has a Java-like File API, native support for raw tables, a pluggable file system, and it works with Hadoop with no modifications.
 
It might work well for streaming media too as you wouldn't have to wait for the complete file to hit the disk before rendering.
Tuesday
Apr162013

Sponsored Post: Surge, Rackspace, Simple, Fitbit, Amazon, Booking, aiCache, Aerospike, Percona, ScaleOut, New Relic, LogicMonitor, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • LogicMonitor is looking for a Front End developer to have a huge impact, be valued, realize their dreams, and help us realize ours. We are looking for someone to own the code that delivers the design and usability of LogicMonitor's enterprise SaaS application(s). Please apply online
  • We need awesome people @ Booking.com - We want YOU! Come design next generation interfaces, solve critical scalability problems, and hack on one of the largest Perl codebases. Please apply online.
  • Help build the platform that powers a better, fairer banking experience at Simple. Join a talented team that chooses its own tools; works across web, Android, iOS, and Ruby/Scala/Clojure backend apps; and develops a secure and scalable banking service on AWS. Learn more at careers.
  • Fitbit is hiring a Site Operations Lead to help us on our mission to make the world a healthier place! Fitbit's wearable fitness devices are worn by people across the world, each syncing with the web site, wirelessly and automatically, every 15 minutes. Join our mission here!
  • The AWS Relational Database Service (RDS) automates management of relational databases in the cloud. We have a wide variety of customers and are part of many mission-critical applications, like the ones built by the 2012 Obama re-election campaign. If you're interested in joining a fast-growing service and team, please send your resume to rds-jobs@amazon.com.
  • New Relic is looking for a Java Scalability Engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter?  http://newrelic.com/about/jobs
  • Aerospike is Hiring! You dream in C - and like it? Then join us as a Senior Distributed Systems Engineer or Client / Application Engineer. People covent your bag of tricks for troubleshooting systems and network issues? Join our Operations and QA team. See if these positions are a fit for you! 

Fun and Informative Events

  • Surge - The Scalability & Performance Conference, presented by OmniTI is happening on Sept. 12th-13th. Special, High Scalability Reader Rate: $50 off registration--now through September
  • It's back! Join the MySQL Community at the annual Percona Live MySQL Conference and Expo in Santa Clara, April 22-25. This year's conference features an outstanding lineup of 92 speakers delivering 112 breakout sessions over three days! 

Cool Products and Services

  • The Rackspace Cloud Application Programming Interface (API) has changed the game allowing customers to easily modify their cloud configuration with just a few lines of code.  Read about three of the most popular things that customers do with the Rackspace AP.
  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free. No sign-up required. http://aicache.com/deploy
  • New Benchmark shows Aerospike nearly 10x Faster than the Competition. Thumbtack Technology YCSB Benchmark shows Aerospike nearly 10x faster than Cassandra, Couchbase and Mongodb. Read it now!
  • ScaleOut Software. In-Memory Data Grids for the Enterprise. Download a Free Trial.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Apr152013

Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years

Pinterest has been riding an exponential growth curve, doubling every month and half. They’ve gone from 0 to 10s of billions of page views a month in two years, from 2 founders and one engineer to over 40 engineers, from one little MySQL server to 180 Web Engines, 240 API Engines, 88 MySQL DBs (cc2.8xlarge) + 1 slave each, 110 Redis Instances, and 200 Memcache Instances.

Stunning growth. So what’s Pinterest's story? To tell their story we have our bards, Pinterest’s Yashwanth Nelapati and Marty Weiner, who tell the dramatic story of Pinterest’s architecture evolution in a talk titled Scaling Pinterest. This is the talk they would have liked to hear a year and half ago when they were scaling fast and there were a lot of options to choose from. And they made a lot of incorrect choices.

This is a great talk. It’s full of amazing details. It’s also very practical, down to earth, and it contains strategies adoptable by nearly anyone. Highly recommended.

Two of my favorite lessons from the talk:

  1. Architecture is doing the right thing when growth can be handled by adding more of the same stuff. You want to be able to scale by throwing money at a problem which means throwing more boxes at a problem as you need them. If you are architecture can do that, then you’re golden.
  2. When you push something to the limit all technologies fail in their own special way. This lead them to evaluate tool choices with a preference for tools that are: mature; really good and simple; well known and liked; well supported; consistently good performers; failure free as possible; free. Using these criteria they selected: MySQL, Solr, Memcache, and Redis. Cassandra and Mongo were dropped.

These two lessons are interrelated. Tools following the principles in (2) can scale by adding more boxes. And as load increases mature products should have fewer problems. When you do hit problems you’ll at least have a community to help fix them.  It’s when your tools are too tricky and too finicky that you hit walls so high you can’t climb over.

It’s in what I think is the best part of the entire talk, the discussion of why sharding is better than clustering, that you see the themes of growing by adding resources, few failure modes, mature, simple, and good support, come into full fruition. Notice all the tools they chose grow by adding shards, not through clustering. The discussion of why they prefer sharding and how they shard is truly interesting and will probably cover ground you’ve never considered before.

Now, let’s see how Pinterest scales:

Click to read more ...

Friday
Apr122013

Stuff The Internet Says On Scalability For April 12, 2013

Hey, it's HighScalability time:


(Ukrainian daredevil scaling buildings)

 

  • 877,000 TPS: Erlang and VoltDB. 
  • Quotable Quotes:
    • Hendrik Volkmer: Complexity + Scale => Reduced Reliability + Increased Chance of catastrophic failures
    • @TheRealHirsty: This coffee could use some "scalability"
    • @billcurtis_: Angular.js with Magento + S3 json file caching = wicked scalability
    • Dan Milstein: Screw you Joel Spolsky, We're Rewriting It From Scratch!
    • Anil Dash: Terms of Service and IP trump the Constitution
    • Jeremy Zawodny: Yeah, seek time matters. A lot.
    • @joeweinman: @adrianco proves why auto scaling is better than curated capacity management. < 50% + Cost Saving
    • @ascendantlogic: Any "framework" naturally follows this progression. Something is complex so someone does something to make it easier. Everyone rushes to it but needs one or two things from the technologies they left behind so they introduce that into the "new" framework. Over the years everyone's edge cases are accounted for with frameworks on top of frameworks and suddenly everyone is looking for the next big simplification.
  • Imagine if you had a beowulf cluster of tiny antennas? You could build a TV rebroadcasting service that has old media running for the Galt's Gulch of pay TV.
  • As a technologically advanced nation, why haven't we done this yet? Nationwide Google Fiber would cost $11B over five years, probably will never happen. I say this while using my nation wide power/telephone/road/defense system.
  • Great list of technical talks. I'm partial to Big Ball of Mud.
  • Making Black Swans work for you: Stick to simple rules; Decentralize; Develop layered systems; Build in redundancy and overcompensation; Resist the urge to suppress randomness; Ensure everyone has skin in the game; Give higher status to practitioners rather than theoreticians.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Apr102013

Check Yourself Before You Wreck Yourself - Avocado's 5 Early Stages of Architecture Evolution

In Don’t panic! Here’s how to quickly scale your mobile apps Mike Maelzer paints a wonderful picture of how Avocado, a mobile app for connecting couples, evolved to handle 30x traffic within a few weeks. If you are just getting started then this is a great example to learn from.

What I liked: it's well written, packing a lot of useful information in a little space; it's failure driven, showing the process of incremental change driven by purposeful testing and production experience; it shows awareness of what's important, in their case, user signup; a replica setup was used for testing, a nice cloud benefit. 

Their Biggest lesson learned is a good one:

It would have been great to start the scaling process much earlier. Due to time pressure we had to make compromises –like dropping four of our media resizer boxes. While throwing more hardware at some scaling problems does work, it’s less than ideal.

Here's my gloss on the article:

Evolution One - Make it Work

Click to read more ...

Friday
Apr052013

Stuff The Internet Says On Scalability For April 5, 2013

Hey, it's HighScalability time:


(Dr. Who Scaling Up the Shard click for cool animated gif)

 

  • 50 sextillion: # of earth-like planets in universe; 100,000: stars
  • Quotable Quotes:
    • @petdance: "I wish I had enough money to run Oracle instead of Postgres." "Why do you want to do that?" "I don't, I just wish I had enough money to."
    • @JBossMike: Java is old. Java is verbose. Java is boring. Java is dead… Java is FAST. 
    • @old_sound: We need a "shrink conf" for when scaling is not what we actually need.
    • Carsten Puls: At first, customers want to get going. Understanding what's going on under the hood isn't that important. As grows, want more control and go under the hood. Managing that balance through lifecycle is important.
    • @rbranson: What does almost every memcache library do during a multi-get when 1 out of 10 boxes times out? F*cking whole thing fails. < Reminded me of this
    • @heyavie: I wonder if King Kong's creators ever talked scalability?
    • arkitaip: It seems very risky to base your core business on a language that's only been around for two years. Sometimes web development seems to be more volatile than the fashion industry.
    • @_Mblueberries: I hate scaling and cleaning the fish.
    • @jcoglan: Reminder that 'scalability' is a property of system architecture and data layouts, not language runtimes (mostly)
    • @SuperLuckyHappy: Latest Headlines:  CIA: “Collect everything and hang on to it forever.” CIA Chief Technology Officer Big Data and Cloud Computing Pre
    • joelgrus: Finally, a way to combine the elegance of functional programming with the unwieldy, verbose syntax of Java!
    • The Archimedes Codex: The transition from the roll to the codex—the book format we know today—was a revolution in the history of data storage. The genius of the codex is that it contains knowledge not in two dimensions, like a roll, but in three. The roll has height and width; the codex has height, width, and depth. Because it has depth, it doesn’t need to be nearly as wide. A codex with 200 folios (400 pages), 6 inches wide, has the same potential data-storage area as a roll of the same height that is 200 feet long. To access data in a codex, you only have to travel through the depth dimension, which is just a couple of inches thick. 

  • Custon Silcon + ZFS + Andy Bechtolsheim = DSSD, a chip startup to "improve the performance and reliability of flash memory for high performance computing, newer data analytics and networking." Pushing compute down to the edge, on to the disk, was long ago predicted by Jim Gray, who declared "locality is king" and "processors are going to migrate to where the transducers are." Sounds like DSSD is taking a shot at fulfilling Jim's vision. They are minimizing OS overhead, excellent. ZFS probably comes in because there's a long standing holy grail of implementing small fine grained objects in the file system, which is a performance nightmare, but extends the promise of doing away with the database layer. The gotcha with all these plays is commodity hardware. It's hard to maintain orders of magnitude performance gains over commodity players who are riding much cheaper cost curves and ever improving performance curves. Initial leads fade quickly and the capital costs keep rising. Sounds like there is a real software system's aspect here, so maybe that will decapitate those curves. In any case, it's great to see people developing hardware rather than staying mediocre with software.
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...