Entries by HighScalability Team (1576)

Tuesday
Jan222013

Sponsored Post: Amazon, Zoosk, Booking, aiCache, Teradata Aster, Aerospike, Percona, ScaleOut, New Relic, NetDNA, Logic Monitor, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • The AWS Relational Database Service (RDS) automates management of relational databases in the cloud. We have a wide variety of customers and are part of many mission-critical applications, like the ones built by the 2012 Obama re-election campaign. If you're interested in joining a fast-growing service and team, please send your resume to rds-jobs@amazon.com.
  • Hiring! Director of Site Operations at Zoosk.  We’re looking for an innovator. Someone who wants to take site operations along with a smart team of Sys Admins to the next level. This is a very hands-on leadership role in a high-availability production environment. Full details here. 
  • We need awesome people @ Booking.com - We want YOU! Come design next
    generation interfaces, solve critical scalability problems, and hack on one of the largest Perl codebases. Apply: http://www.booking.com/jobs.en-us.html
  • Teradata Aster is looking for Distributed Systems, Analytic Applications,  and Performance Architects. As a member of the Architecture Group you will help define the technical roadmap for the product.
  • The New York Times is seeking a developer focused on infrastructure to join its newsroom development team. Read the full description here and send resumes to chadas@nytimes.com.
  • New Relic is looking for a Java Scalability Engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter?  http://newrelic.com/about/jobs

Fun and Informative Events

Cool Products and Services

  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free. No sign-up required. http://aicache.com/deploy
  • New Benchmark shows Aerospike nearly 10x Faster than the Competition. Thumbtack Technology YCSB Benchmark shows Aerospike nearly 20x faster than Cassandra, Couchbase and Mongodb. Read it now!
  • ScaleOut Software. In-memorry Data Grids for the Enterprise. Download a Free Trial.
  • NetDNA, a Tier-1 GlobalContent Delivery Network, offers a Dual-CDN strategy which allows companies to utilize a redundant infrastructure while leveraging the advantages of multiple CDNs to reduce costs.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Jan212013

Processing 100 Million Pixels a Day - Small Amounts of Contention Cause Big Problems at Scale

This is a guest post by Gordon Worley, a Software Engineer at Korrelate, where they correlate (see what they did there) online purchases to offline purchases.

Several weeks ago, we came into the office one morning to find every server alarm going off. Pixel log processing was behind by 8 hours and not making headway. Checking the logs, we discovered that a big client had come online during the night and was giving us 10 times more traffic than we were originally told to expect. I wouldn’t say we panicked, but the office was certainly more jittery than usual. Over the next several hours, though, thanks both to foresight and quick thinking, we were able to scale up to handle the added load and clear the backlog to return log processing to a steady state...

Click to read more ...

Friday
Jan182013

Stuff The Internet Says On Scalability For January 18, 2013

Hey, it's HighScalability time:

  • 1 trillion nodes : The Near Future; 1 trillion connections : Facebook Now; 1 billion celestial objects observed : Gaia mission
  • Quotable Quotes:
    • Van Jacobson : IP started as an overlay on the phone system; today the phone system is an overlay on IP.
    • @MarkDurbin104 : Unit of Logic: a Fathom?
    • @somic : virtual infra with API is a cloud as much as a bunch of shell scripts are infra as code
    • @xaprb : I'm going to settle the argument about linear scalability once and for all. Pianos are linearly scalable. Fish are not. End of story.
    • @gigastacey : Facebook's cold storage is 1 exabyte per room with 1.5MW per room power requirement with no redundant power #ocpsummit
    • dgb75 : PHP, not my first choice but the right choice
    • @iaboyeji : Scalability is a rich man's problem
  • Joe Stump on Sprintly on the impact of reducing ORM overhead: On a primed cache, your @sprintly experience should be 100x faster now. On an unprimed cache a mere 10-15x faster.
  • Not a lot of technological detail on Facebook's new Graph Search, but here's the broad story of how it came about. Facebook has a lot of structured data about people so search needed to take advantage of that. They started in 2011 deciding to build a unified search. A quick prototype proved a proof of concept. Next they built a substring parser that could generate and rank all the potential page titles matching a query. To answer queries, with privacy filters applied, they leveraged an already existing search engine within Facebook called Unicorn. What's missing is an index of all posts and comments shared on Facebook.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Jan162013

What if Cars Were Rented Like We Hire Programmers?

Imagine if you will that car rental agencies rented cars like programmers are hired at many software companies...

Agency : So sorry you had to wait in the reception area for an hour. Nobody knew you were coming to today. I finally found 8 people to interview before we can rent you a car. If we like you you may have to come in for another round of interviews tomorrow because our manager isn't in today. I didn't have a chance to read your application, so I'll just start with a question. What car do you drive today?
Applicant : I drive a 2008 Subaru.
Agency : That's a shame. We don't have a Subaru to rent you.
Applicant : That's OK. Any car will do.
Agency : No, we can only take on clients who know how to drive the cars we stock. We find it's safer that way. There are so many little differences between cars, we just don't want to take a chance.
Applicant : I have a drivers license. I know how to drive. I've been driving all kinds of cars for 15 years, I am sure I can adapt. 
Agency : We appreciate your position, but we can only take exact matches. Otherwise, how could we ever know if you could drive one of our cars?

Click below to see how the story ends, but you probably already know the ending...

Click to read more ...

Tuesday
Jan152013

More Numbers Every Awesome Programmer Must Know

Colin Scott, a Berkeley researcher, updated Jeff Dean’s famous Numbers Everyone Should Know with his Latency Numbers Every Programmer Should Know interactive graphic. The interactive aspect is cool because it has a slider that let’s you see numbers back from as early as 1990 to the far far future of 2020. 

Colin explained his motivation for updating the numbers:

The other day, a friend mentioned a latency number to me, and I realized that it was an order of magnitude smaller than what I had memorized from Jeff’s talk. The problem, of course, is that hardware performance increases exponentially! After some digging, I actually found that the numbers Jeff quotes are over a decade old

Since numbers without interpretation are simply data, take a look at Google Pro Tip: Use Back-Of-The-Envelope-Calculations To Choose The Best Design. The idea is back-of-the-envelope calculations are estimates you create using a combination of thought experiments and common performance numbers to a get a good feel for which designs will meet your requirements.

And given most of these measures are in nanoseconds, to better understand the nanosecond you can do no better than Grace Hopper To Programmers: Mind Your Nanoseconds! 11.8 inches is the length of wire that light travels in a nanosecond, a billionth of a second.

Colin's post inspired some great threads On Reddit and On Hacker News. Here are some I found particularly juicy:

To the idea that these numbers are inaccurate Beckneard counters:

Click to read more ...

Monday
Jan142013

MongoDB and GridFS for Inter and Intra Datacenter Data Replication 

This is a guest post by Jeff Behl, VP Ops @ LogicMonitor. Jeff has been a bit herder for the last 20 years, architecting and overseeing the infrastructure for a number of SaaS based companies.  

Data Replication for Disaster Recovery

An inevitable part of disaster recovery planning is making sure customer data exists in multiple locations.  In the case of LogicMonitor, a SaaS-based monitoring solution for physical, virtual, and cloud environments, we wanted copies of customer data files both within a data center and outside of it.  The former was to protect against the loss of individual servers within a facility, and the latter for recovery in the event of the complete loss of a data center.

Where we were:  Rsync

Click to read more ...

Friday
Jan112013

Stuff The Internet Says On Scalability For January 11, 2013

Hey, it's HighScalability time:

  • 240,000,000,000 URLs : Wayback Machine; 743 billion : number of words Google analyzed to find etaoin srhldcu were the most used letters in the English language
  • Quotable Quotes:
    • @actuallyshayne : Building cloud scalability is a lot like playing a tower defense game.
    • @traviskaufman : lesson of the day: there is a major difference between "scalability" and "overcomplication"
    • @deathmtn : Sometimes it seems like that storm of discussion about massive scalability has boiled down to "avoid JOINs and other multi-table queries."
    • @rbranson : we use c1.xlarges for user caching since they need to push so many req/sec.
    • @rbranson : we've got memcache instances in an AZ with no app instances and they can push 100K PPS across AZs fine.
  • Gabriel Weinberg of DuckDuckGo on Orders of magnitude: I find framing things in orders of magnitude is a really useful way to measure progress and think about the future. Not much changes structurally if you grow by a factor of two; usually your technical and non-technical infrastructure can handle that kind of growth pretty easily. But when you grow by a factor of ten (an order of magnitude) something usually breaks. 

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Jan092013

The Story of How Turning Disk Into a Service Lead to a Deluge of Density

 

We usually think of the wonderful advantages of service oriented architectures as a software thing, but it also applies to hardware. In Security Now 385, that Doyen of Disk, Steve Gibson, tells the fascinating story (@ about 41:30) of how moving to a service oriented architecture in hard drives, modeling a drive as a linear stream of sectors, helped create the amazing high density disk drives we enjoy today.

When drives switched to use the IDE (integrated drive electronics) interface, the controller function moved into the drive instead of the computer. No longer were low level drive signals moved across cables and into the motherboard. Now we just ask the drive for the desired sector and the drive takes care of it.

This allowed manufacturers to do anything they wanted to behind the IDE interface. The drive stopped being dumb, it became smart, providing a sort of sector service. Density sky rocketed because there was no dependency on the computer. All the internals could completely change as needed to support higher and higher densities. Steve of course gives a detailed evolution of hard drive internals, but that's the gist of it.

Now have multi terabyte drives for next to nothing. Separate your concerns and good things can happen.

A cool and unexpected story. Thanks Steve.

Tuesday
Jan082013

Sponsored Post: Flurry, Rumble Games, Booking, aiCache, Teradata Aster, Aerospike, Percona, ScaleOut, New Relic, NetDNA, GigaSpaces, Logic Monitor, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • Flurry has built large-scale app measurement and advertising services that are used by more than 80,000 media companies and independent developers to monetize mobile and related platforms. If you're interested in joining a thriving, growing team, please check us out.
  • Rumble Games is looking for a Senior Platform Engineer to build massively scalable and shared services for the next generation of online games. We have the best team this industry has seen, and we will transform the way people play together. Join us.
  • We need awesome people @ Booking.com - We want YOU! Come design next
    generation interfaces, solve critical scalability problems, and hack on one of the largest Perl codebases. Apply: http://www.booking.com/jobs.en-us.html
  • Teradata Aster is looking for Distributed Systems, Analytic Applications,  and Performance Architects. As a member of the Architecture Group you will help define the technical roadmap for the product.
  • The New York Times is seeking a developer focused on infrastructure to join its newsroom development team. Read the full description here and send resumes to chadas@nytimes.com.
  • New Relic is looking for a Java Scalability Engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter?  http://newrelic.com/about/jobs

Fun and Informative Events

Cool Products and Services

  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free. No sign-up required. http://aicache.com/deploy
  • Aerospike: Two Trillion Transactions per month...100 million stored user profiles...25% of all video ads processed on the internet - mere realities of success for Aerospike customers. Industry leaders reveal their secrets
  • ScaleOut Software. In-memorry Data Grids for the Enterprise. Download a Free Trial.
  • Follow the Cloudify blog to learn more about our open source PaaS stack – latest integration recipes, builds, features, and other cool stuff.  Visit the GigaSpaces blog to learn how to take your application to the next level of scalability and performance.
  • NetDNA, a Tier-1 GlobalContent Delivery Network, offers a Dual-CDN strategy which allows companies to utilize a redundant infrastructure while leveraging the advantages of multiple CDNs to reduce costs.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Jan072013

Analyzing billions of credit card transactions and serving low-latency insights in the cloud

This is a guest post by Ivan de Prado and Pere Ferrera, founders of Datasalt, the company behind Pangool and Splout SQL Big Data open-source projects.

The amount of payments performed using credit cards is huge. It is clear that there is inherent value in the data that can be derived from analyzing all the transactions. Client fidelity, demographics, heat maps of activity, shop recommendations, and many other statistics are useful to both clients and shops for improving their relationship with the market. At Datasalt we have developed a system in collaboration with the BBVA bank that is able to analyze years of data and serve insights and statistics to different low-latency web and mobile applications.

The main challenge we faced besides processing Big Data input is that the output was also Big Data, and even bigger than the input. And this output needed to be served quickly, under high load.

The solution we developed has an infrastructure cost of just a few thousands of dollars per month thanks to the use of the cloud (AWS), Hadoop and Voldemort. In the following lines we will explain the main characteristics of the proposed architecture.

Data, goals and first decisions

Click to read more ...