Wednesday
Jan162013

What if Cars Were Rented Like We Hire Programmers?

Imagine if you will that car rental agencies rented cars like programmers are hired at many software companies...

Agency : So sorry you had to wait in the reception area for an hour. Nobody knew you were coming to today. I finally found 8 people to interview before we can rent you a car. If we like you you may have to come in for another round of interviews tomorrow because our manager isn't in today. I didn't have a chance to read your application, so I'll just start with a question. What car do you drive today?
Applicant : I drive a 2008 Subaru.
Agency : That's a shame. We don't have a Subaru to rent you.
Applicant : That's OK. Any car will do.
Agency : No, we can only take on clients who know how to drive the cars we stock. We find it's safer that way. There are so many little differences between cars, we just don't want to take a chance.
Applicant : I have a drivers license. I know how to drive. I've been driving all kinds of cars for 15 years, I am sure I can adapt. 
Agency : We appreciate your position, but we can only take exact matches. Otherwise, how could we ever know if you could drive one of our cars?

Click below to see how the story ends, but you probably already know the ending...

Click to read more ...

Tuesday
Jan152013

More Numbers Every Awesome Programmer Must Know

Colin Scott, a Berkeley researcher, updated Jeff Dean’s famous Numbers Everyone Should Know with his Latency Numbers Every Programmer Should Know interactive graphic. The interactive aspect is cool because it has a slider that let’s you see numbers back from as early as 1990 to the far far future of 2020. 

Colin explained his motivation for updating the numbers:

The other day, a friend mentioned a latency number to me, and I realized that it was an order of magnitude smaller than what I had memorized from Jeff’s talk. The problem, of course, is that hardware performance increases exponentially! After some digging, I actually found that the numbers Jeff quotes are over a decade old

Since numbers without interpretation are simply data, take a look at Google Pro Tip: Use Back-Of-The-Envelope-Calculations To Choose The Best Design. The idea is back-of-the-envelope calculations are estimates you create using a combination of thought experiments and common performance numbers to a get a good feel for which designs will meet your requirements.

And given most of these measures are in nanoseconds, to better understand the nanosecond you can do no better than Grace Hopper To Programmers: Mind Your Nanoseconds! 11.8 inches is the length of wire that light travels in a nanosecond, a billionth of a second.

Colin's post inspired some great threads On Reddit and On Hacker News. Here are some I found particularly juicy:

To the idea that these numbers are inaccurate Beckneard counters:

Click to read more ...

Monday
Jan142013

MongoDB and GridFS for Inter and Intra Datacenter Data Replication 

This is a guest post by Jeff Behl, VP Ops @ LogicMonitor. Jeff has been a bit herder for the last 20 years, architecting and overseeing the infrastructure for a number of SaaS based companies.  

Data Replication for Disaster Recovery

An inevitable part of disaster recovery planning is making sure customer data exists in multiple locations.  In the case of LogicMonitor, a SaaS-based monitoring solution for physical, virtual, and cloud environments, we wanted copies of customer data files both within a data center and outside of it.  The former was to protect against the loss of individual servers within a facility, and the latter for recovery in the event of the complete loss of a data center.

Where we were:  Rsync

Click to read more ...

Friday
Jan112013

Stuff The Internet Says On Scalability For January 11, 2013

Hey, it's HighScalability time:

  • 240,000,000,000 URLs : Wayback Machine; 743 billion : number of words Google analyzed to find etaoin srhldcu were the most used letters in the English language
  • Quotable Quotes:
    • @actuallyshayne : Building cloud scalability is a lot like playing a tower defense game.
    • @traviskaufman : lesson of the day: there is a major difference between "scalability" and "overcomplication"
    • @deathmtn : Sometimes it seems like that storm of discussion about massive scalability has boiled down to "avoid JOINs and other multi-table queries."
    • @rbranson : we use c1.xlarges for user caching since they need to push so many req/sec.
    • @rbranson : we've got memcache instances in an AZ with no app instances and they can push 100K PPS across AZs fine.
  • Gabriel Weinberg of DuckDuckGo on Orders of magnitude: I find framing things in orders of magnitude is a really useful way to measure progress and think about the future. Not much changes structurally if you grow by a factor of two; usually your technical and non-technical infrastructure can handle that kind of growth pretty easily. But when you grow by a factor of ten (an order of magnitude) something usually breaks. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Jan092013

The Story of How Turning Disk Into a Service Lead to a Deluge of Density

 

We usually think of the wonderful advantages of service oriented architectures as a software thing, but it also applies to hardware. In Security Now 385, that Doyen of Disk, Steve Gibson, tells the fascinating story (@ about 41:30) of how moving to a service oriented architecture in hard drives, modeling a drive as a linear stream of sectors, helped create the amazing high density disk drives we enjoy today.

When drives switched to use the IDE (integrated drive electronics) interface, the controller function moved into the drive instead of the computer. No longer were low level drive signals moved across cables and into the motherboard. Now we just ask the drive for the desired sector and the drive takes care of it.

This allowed manufacturers to do anything they wanted to behind the IDE interface. The drive stopped being dumb, it became smart, providing a sort of sector service. Density sky rocketed because there was no dependency on the computer. All the internals could completely change as needed to support higher and higher densities. Steve of course gives a detailed evolution of hard drive internals, but that's the gist of it.

Now have multi terabyte drives for next to nothing. Separate your concerns and good things can happen.

A cool and unexpected story. Thanks Steve.

Tuesday
Jan082013

Sponsored Post: Flurry, Rumble Games, Booking, aiCache, Teradata Aster, Aerospike, Percona, ScaleOut, New Relic, NetDNA, GigaSpaces, Logic Monitor, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Flurry has built large-scale app measurement and advertising services that are used by more than 80,000 media companies and independent developers to monetize mobile and related platforms. If you're interested in joining a thriving, growing team, please check us out.
  • Rumble Games is looking for a Senior Platform Engineer to build massively scalable and shared services for the next generation of online games. We have the best team this industry has seen, and we will transform the way people play together. Join us.
  • We need awesome people @ Booking.com - We want YOU! Come design next
    generation interfaces, solve critical scalability problems, and hack on one of the largest Perl codebases. Apply: http://www.booking.com/jobs.en-us.html
  • Teradata Aster is looking for Distributed Systems, Analytic Applications,  and Performance Architects. As a member of the Architecture Group you will help define the technical roadmap for the product.
  • The New York Times is seeking a developer focused on infrastructure to join its newsroom development team. Read the full description here and send resumes to chadas@nytimes.com.
  • New Relic is looking for a Java Scalability Engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter?  http://newrelic.com/about/jobs

Fun and Informative Events

Cool Products and Services

  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free. No sign-up required. http://aicache.com/deploy
  • Aerospike: Two Trillion Transactions per month...100 million stored user profiles...25% of all video ads processed on the internet - mere realities of success for Aerospike customers. Industry leaders reveal their secrets
  • ScaleOut Software. In-memorry Data Grids for the Enterprise. Download a Free Trial.
  • Follow the Cloudify blog to learn more about our open source PaaS stack – latest integration recipes, builds, features, and other cool stuff.  Visit the GigaSpaces blog to learn how to take your application to the next level of scalability and performance.
  • NetDNA, a Tier-1 GlobalContent Delivery Network, offers a Dual-CDN strategy which allows companies to utilize a redundant infrastructure while leveraging the advantages of multiple CDNs to reduce costs.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Jan072013

Analyzing billions of credit card transactions and serving low-latency insights in the cloud

This is a guest post by Ivan de Prado and Pere Ferrera, founders of Datasalt, the company behind Pangool and Splout SQL Big Data open-source projects.

The amount of payments performed using credit cards is huge. It is clear that there is inherent value in the data that can be derived from analyzing all the transactions. Client fidelity, demographics, heat maps of activity, shop recommendations, and many other statistics are useful to both clients and shops for improving their relationship with the market. At Datasalt we have developed a system in collaboration with the BBVA bank that is able to analyze years of data and serve insights and statistics to different low-latency web and mobile applications.

The main challenge we faced besides processing Big Data input is that the output was also Big Data, and even bigger than the input. And this output needed to be served quickly, under high load.

The solution we developed has an infrastructure cost of just a few thousands of dollars per month thanks to the use of the cloud (AWS), Hadoop and Voldemort. In the following lines we will explain the main characteristics of the proposed architecture.

Data, goals and first decisions

Click to read more ...

Friday
Jan042013

Stuff The Internet Says On Scalability For January 4, 2013

It's HighScalability time:

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Jan022013

Why Pinterest Uses the Cloud Instead of Going Solo - To Be Or Not To Be

I love it when a somewhat older article like Pinterest Cut Costs From $54 To $20 Per Hour By Automatically Shutting Down Systems hits Hacker News and generates a good conversation. One of the common sentiments about the cloud was raised: why doesn't Pinterest save a lot of money by running their own hardware instead of using the cloud?

Ryan Park, Operations Engineer at Pinterest, responds in what I think is the perfect modern response to that ultimate existential design dilemma:

Click to read more ...

Monday
Dec312012

Designing for Resiliency will be so 2013

A big part of engineering for a quality experience is bringing in the long tail. An improbable severe failure can ruin your experience of a site, even if your average experience is quite good. That's where building for resilience comes in. Resiliency used to be outside the realm of possibility for the common system. It was simply too complex and too expensive.

An evolution has been underway, making 2013 possibly the first time resiliency is truly on the table as a standard part of system architectures. We are getting the clouds, we are getting the tools, and prices are almost low enough.

Even Netflix, real leaders in the resiliency architecture game, took some heat for relying completely on Amazon's ELB and not having a backup load balancing system, leading to a prolonged Christmas Eve failure. Adrian Cockcroft, Cloud Architect at Netflix, said they've investigated creating their own load balancing service, but that "we try not to invest in undifferentiated heavy lifting."

So resiliency is still not part of the standard package. There's an ROI calculation that has to be made. Yet the path Netflix would have to take in creating a hybrid architecture is fairly clear, Netflix prefers to concentrate on features rather than long tail events. That's a big difference. At one time designing for resiliency would have been unthinkable, now it's becoming a choice. 

A good New Year's resolution might be to learn more about resilience. It's a new way of thinking compared to straightforward high availability. It's a full stack, full team, full system, environment centric mode of thought.

Fortunately, Dr. Richard Cook, Professor of Healthcare Systems Safety and Chairman of the Department of Patient Safety at the Kungliga Techniska Hogskolan, has been thinking about resilience for a long time. And he gave a fascinating talk: How Complex Systems Fail on resilience, that is just detailed enough to be practical and high level enough to inspire new directions.

Here's a gloss of the essentials from his talk:

Why Don’t Systems Fail More Often?

Click to read more ...