Entries by HighScalability Team (1576)

Wednesday
Jun202012

iDoneThis - Scaling an Email-based App from Scratch

This is a guest post by Rodrigo Guzman, CTO of iDoneThis, which makes status reporting happen at your company with the lightest possible touch.

iDoneThis is a simple management application that emails your team at the end of every day to ask, "What'd you get done today?" Just reply with a few lines of what you got done. The following morning everyone on your team gets a digest with what the team accomplished the previous day to keep everyone in the loop and kickstart another awesome day.

Before we launched, we built iDoneThis over a weekend in the most rudimentary way possible. I kid you not, we sent the first few batches of daily emails using the BCC field of a Gmail inbox. The upshot is that we’ve had users on the site from Day 3 of its existence on.

We’ve gone from launch in January 2011 when we sent hundreds of emails out per day by hand to sending out over 1 million emails and handling over 200,000 incoming emails per month. In total, customers have recorded over 1.7 million dones.

Stats 

Click to read more ...

Monday
Jun182012

Google on Latency Tolerant Systems: Making a Predictable Whole Out of Unpredictable Parts  

In Taming The Long Latency Tail we covered Luiz Barroso’s exploration of the long tail latency (some operations are really slow) problems generated by large fanout architectures (a request is composed of potentially thousands of other requests). You may have noticed there weren’t a lot of solutions. That’s where a talk I attended, Achieving Rapid Response Times in Large Online Services (slide deck), by Jeff Dean, also of Google, comes in:

In this talk, I’ll describe a collection of techniques and practices lowering response times in large distributed systems whose components run on shared clusters of machines, where pieces of these systems are subject to interference by other tasks, and where unpredictable latency hiccups are the norm, not the exception.

The goal is to use software techniques to reduce variability given the increasing variability in underlying hardware, the need to handle dynamic workloads on a shared infrastructure, and the need to use large fanout architectures to operate at scale.

Two forces motivate Google’s work on latency tolerance:

Click to read more ...

Monday
Jun182012

The Clever Ways Chrome Hides Latency by Anticipating Your Every Need

Ilya Grigorik has written another wonderful article lavishly detailing the extraordinary tactics Chrome employs to hide network latency from users: Chrome Networking: DNS Prefetch & TCP Preconnect. Ilya springs some surpising factoids on us, revealing how the web has slowed and super sized:

  • The size of an average page has grown to 1059kB and is now composed of over 80 subresource requests.
  • An average DNS lookup takes between 60 and 120ms. This creates a 100-200ms of latency before a request can be sent because of th full round-trip (RTT) to perform the TCP handshake.
  • Slow mobile experiences are largely due to the much higher RTT's (200-1000ms) on wireless networks. Reducing the number of outbound connections and the total byte size of your pages is the single best optimization you can make for mobile today. 

Chrome reduces apparent latency using a host of clever anticipatory mechanisms:

Click to read more ...

Friday
Jun152012

Stuff The Internet Says On Scalability For June 15, 2012

It's HighScalability Time:

  • 100PB : Facebook HDFS Cluster; One Trillion : Objects in S3
  • Quotable quotes:
    • @mwinkle : Listening to NASA big data challenges at ‪#hadoopSummit‬, the square kilometer array project will produce 700tb per second. TB. Per second.
    • @imrantech : #hadoopsummit‬ @twitter - 400M tweets, 80-100TB per day
    • @r39132 : At Netflix talk at ‪#hadoopsummit‬ : 2 B hours streamed in Q4 2011, 75% of the 30M daily movie starts are sourced from recommendations
    • @nattybnatkins : Run job. Identify bottleneck. Address bottleneck. Repeat. Sage wisdom from @tlipcon on optimizing MR jobs ‬ ‪#HadoopSummit‬
    • @chiradeep :  mainframe cost of operation - $5k per MIP per year ‪#hadoopsummit‬
    • @MCanalytics : #hadoopsummit‬ Yahoo metrics - 140pb on 42k nodes with 500 users on 360k Hadoop jobs for 100b events/day Holy smokes!
    • @M_Wein : Domain expertise is the wave of the future: it's more about "Hadoop and Healthcare" than "Using Bayesian counters with Hadoop" ‪#hadoopsummit‬
  • Twitter was an unexpected pleasure at the Hadoop Summit with many quality and interesting talks. Dmitriy Ryaboy in Twitter at the Hadoop Summit overviews the talks and points to what was talked about. I found the Large-scale machine learning at twitter very well done, not so much for the use of Pig, but for the process involved in mapping learning to creating non-iterative aggregators.
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Jun132012

Why My Soap Film is Better than Your Hadoop Cluster

The ever amazing slime mold is not the only way to solve complex compute problems without performing calculations. There is another: soap film. Unfortunately for soap film it isn’t nearly as photogenic as slime mold, all we get are boring looking pictures, but the underlying idea is still fascinating and ten times less spooky.

As a quick introduction we’ll lean on Long Ouyang, who has really straightforward  explanation of how soap film works in Approaching P=NP: Can Soap Bubbles Solve The Steiner Tree Problem In Polynomial.

It’s computers, so playing the role of the motivating graph problem we have the Steiner tree problem, which Ouyang explains as:

Find the minimum spanning tree for a bunch of vertices, given that you can add additional points.

Soap helps solve this problem because:

Click to read more ...

Monday
Jun112012

Monday Fun: Seven Databases in Song

If you understand things best when they're formatted as a musical, this video is for you. It teaches the essentials of PostgreSQL, Riak, HBase, MongoDB, CouchDB, Neo4J and Redis in the style of My Fair Lady. And for a change, it's very SFW.

Friday
Jun082012

Stuff The Internet Says On Scalability For June 8, 2012

It's HighScalability Time:

  • 21TB : Tumblr relational data
  • Quotable Quotes:
    • @ajbaird: Scalability is not a "feature" tacked on at the end development.
    • @h_ingo: I like Doron's comparison: Build a MySQL scale-out cluster instead, then buy 2 Ferrari's with the money saved :-) 
  • You might figure Harry Potter would have some sort of scaling spell, but no, he has to rely on the muggle powered Azure. Pottermore uses Azure to handle 110 million page impressions a day. 
  • Ian Bogost in What Should We Do for a Living? brings up a sobering idea from the Facebook Illusion, the Internet economy will not save us, it sucks at scaling jobs and exists because it is subsidized by surpluses from the old economy it was supposed to replace. Where is that replicator when we need it?
  • In Praise of Idleness. Bruce Dawson argues against busy waiting and for locks, in most cases. Couldn't agree more, that's a lot of CPU doing nothing and programmers quickly lose track of the overall flow of the program, though Adaptive spinning looks interesting. On Reddit.
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Tuesday
Jun052012

Sponsored Post: Digital Ocean, NetDNA, Torbit, Velocity, Reality Check Network, Gigaspaces, AiCache, Logic Monitor, Attribution Modeling, AppDynamics, CloudSigma, ManageEnine, Site24x7

Who's Hiring? 

  • Torbit is hiringCare about performance? Care about making the internet faster and better? At Torbit we use lots of Golang, Node.js, JavaScript and PHP to solve big challenges.

Fun and Informative Events

Cool Products and Services

  • NetDNA, a Tier-1 GlobalContent Delivery Network, offers a Dual-CDN strategy which allows companies to utilize a redundant infrastructure while leveraging the advantages of multiple CDNs to reduce costs.
  • Digital Ocean is a Simple Cloud Hosting platform that offers Free Unlimited Bandwidth and Virtual Servers from $10 per month. Sign up for free and set-up your virtual server in 60 seconds or less.
  • Reality Check Network offers powerful hosting solutions and managed servers for high traffic/bandwidth websites backed by unlimited network, server and application support.
  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free.  No sign-up required. http://aicache.com/deploy
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • CloudSigma. Utility style high performance cloud servers in the US and Europe delivered on all 10GigE networking. Run any OS, take advantage of SSD storage and tailored infrastructure options.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

For a longer description of each sponsor, please read more below...

Click to read more ...

Monday
Jun042012

OpenFlow/SDN is Not a Silver Bullet for Network Scalability

Ivan Pepelnjak (CCIE#1354 Emeritus) is Chief Technology Advisor at NIL Data Communications, author of numerous webinars and advanced networking books, and a prolific blogger. He’s focusing on data center and cloud networking, network virtualization, and scalable application design.

OpenFlow is an interesting emerging networking technology appearing seemingly out of nowhere with much hype and fanfare in March 2011. More than a year later, there are two commercial products based on OpenFlow (NEC’s Programmable Flow and Nicira’s Network Virtualization Platform) and probably less than a dozen production-grade implementations (including Google’s G-Scale network and Indiana University’s campus network). Is this an expected result for an emerging technology or another case of overhyped technology hitting limits imposed by reality?

OpenFlow-based solutions have to overcome numerous problems every emerging technology is facing, in OpenFlow’s case ranging from compatibility with existing chipsets to incomplete and fast-changing specifications (and related compatibility issues), but they’re also hitting some hard scalability limits that we’ll explore in the rest of this article.

What Is OpenFlow?

Click to read more ...

Friday
Jun012012

Stuff The Internet Says On Scalability For June 1, 2012

It's HighScalability Time:

  • Yottabytes : What NSA knows about US; 214ms : ping between San Jose and Fez; $42M : MongoDB is funding scale!; 20K : lines of THX sound code
  • @adrianco: My takeaway from the MongoDB talk at ‪#gluecon‬ is that Mongo is implementing eventual scalability in the next version
  • The death of the general purpose computer is causing strange events like Facebook making their own smart phone. Adam Smith said we all benefit when our neighbors get richer, it creates a bigger pie. We are heading back to the mercantalist notion of a zero sum game. Google is also racing to the bottom Google Product Search To Become Google Shopping, Use Pay-To-Play Model. Zero sum thinking always leads to war. Just sayin.
  • Stuxnet, sometimes you just can't keep it in your pants and Pandora always complained that lid was never on very tight. Bad Prometheus.
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...