Monday
Feb042013

Is Provisioned IOPS Better? Yes, it Delivers More Consistent and Higher Performance IO

Amazon created a whole new class of service with their Provisioned IOPS for RDS, EBS, and DynamoDB. The idea is simple. If you want more performance, you turn a dial up. If you want less, you turn a dial down. A beautifully simple model. You pay for the performance you want, which is different than their previous cloud model, where performance varied, but you paid only for what you used. 

The question: Do these higher priced services really work better?

Rodrigo Campos put this question to the test (only for EBS) by running a benchmark he describes in IOMelt Provisioned IOPS EBS Benchmark Results - December 2012.

The result? Yes, AWS Provisioned IOPS Volumes Really Deliver More Consistent and Higher Performance IO:

Click to read more ...

Friday
Feb012013

Stuff The Internet Says On Scalability For February 1, 2013

Hey, it's HighScalability time:

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
Jan302013

Better Browser Caching is More Important than No Javascript or Fast Networks for HTTP Performance

Performance guru Steve Souders gave his keynote presentation, Cache is King! (slides), at the HTML5DevCon, besides being an extremely clear explanation of how caching works on the Internet and how to optimize your use of HTTP to get the best performance, Steve ran experiments that found some surprising results on what gave the best web site performance improvements.

In his base line test, page loads took 7.65 seconds (median of three runs). What change--Fast Network, No Javascript, or Primed Cache--would make the biggest performance improvement? It was Primed Cache.

  • Fast Network - Using a fast FIOS network the load time was 4.13 seconds. Steve was surprised how big a difference this made, given how much work must happen in the browser. 
  • No JavaScript - 4.74 seconds after disabling JavaScript. Both reduces transfers and skips parsing by the browser. Steve thought the effect would have been larger.
  • Primed Cache - 3.46 seconds using a warm cache, less than half than the empty cache page view time because it reduced the number of HTTP requests and reduced the total transfer times. Key for mobile where higher latencies are common.

The implication being that caching is important so you must understand how HTTP caching works and how to make the best use of it. That's the rest of the talk.

Some key takeaways: 

Click to read more ...

Monday
Jan282013

DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing

This is an interview with Gabriel Weinberg, founder of Duck Duck Go and general all around startup guru, on what DDG’s architecture looks like in 2012.

Innovative search engine upstart DuckDuckGo had 30 million searches in February 2012 and averages over 1 million searches a day. It’s being positioned by super investor Fred Wilson as a clean, private, impartial and fast search engine. After talking with Gabriel I like what Fred Wilson said earlier, it seems closer to the heart of the matter: We invested in DuckDuckGo for the Reddit, Hacker News anarchists.
                  
Choosing DuckDuckGo can be thought of as not just a technical choice, but a vote for revolution. In an age when knowing your essence is not about about love or friendship, but about more effectively selling you to advertisers, DDG is positioning themselves as the do not track alternative, keepers of the privacy flame. You will still be monetized of course, but in a more civilized and anonymous way. 

Pushing privacy is a good way to carve out a competitive niche against Google et al, as by definition they can never compete on privacy. I get that. But what I found most compelling is DDG’s strong vision of a crowdsourced network of plugins giving broader search coverage by tying an army of vertical data suppliers into their search framework. For example, there's a specialized Lego plugin for searching against a complete Lego database. Use the name of a spice in your search query, for example, and DDG will recognize it and may trigger a deeper search against a highly tuned recipe database. Many different plugins can be triggered on each search and it’s all handled in real-time.

Can’t searching the Open Web provide all this data? No really. This is structured data with semantics. Not an HTML page. You need a search engine that’s capable of categorizing, mapping, merging, filtering, prioritizing, searching, formatting, and disambiguating richer data sets and you can’t do that with a keyword search. You need the kind of smarts DDG has built into their search engine. One problem of course is now that data has become valuable many grown ups don’t want to share anymore.

Being ad supported puts DDG in a tricky position. Targeted ads are more lucrative, but ironically DDG’s do not track policies means they can’t gather targeting data. Yet that’s also a selling point for those interested in privacy. But as search is famously intent driven, DDG’s technology of categorizing queries and matching them against data sources is already a form of high value targeting.

It will be fascinating to see how these forces play out. But for now let’s see how DuckDuckGo implements their search engine magic...

Information Sources

Click to read more ...

Friday
Jan252013

Stuff The Internet Says On Scalability For January 25, 2013

Sorry, Stuff the Internet Says has been called on the account of a power outage. Gods of rain and tree have interfered with thee. Instead, how about watching a little Python? (that's Monty, not the language)

Thursday
Jan242013

NoSQL Parody: say No! No! and No!

While certainly not in the same class as Hilarious Video: Relational Database vs NoSQL Fanbois or NSFW: Hilarious Fault-Tolerance Cartoon, this parody does have some really good moments:

Wednesday
Jan232013

Building Redundant Datacenter Networks is Not For Sissies - Use an Outside WAN Backbone

Ivan Pepelnjak, in his short and information packed REDUNDANT DATA CENTER INTERNET CONNECTIVITY video, shows why networking as played at the highest levels is something you want to leave to professionals, like a large animal country vetenarian delivering a stuck foal at 2AM on a dark and stormy night. 

There are always a lot questions about the black art of building redundant datacenter networks and there's a shortage of accessible explanations. What I liked about Ivan's video is how effortlessly he explains the issues and tradeoffs you can expect in designing your own solution, as well as giving creative solutions to those problems. A lot of years of experience are boiled down to a 17 minute video.

Ivan begins by showing what a canonical fully redundant datacenter would look like:

It's like an ark where everything goes two by two...

Click to read more ...

Tuesday
Jan222013

Sponsored Post: Amazon, Zoosk, Booking, aiCache, Teradata Aster, Aerospike, Percona, ScaleOut, New Relic, NetDNA, Logic Monitor, AppDynamics, ManageEngine, Site24x7

Who's Hiring?

  • The AWS Relational Database Service (RDS) automates management of relational databases in the cloud. We have a wide variety of customers and are part of many mission-critical applications, like the ones built by the 2012 Obama re-election campaign. If you're interested in joining a fast-growing service and team, please send your resume to rds-jobs@amazon.com.
  • Hiring! Director of Site Operations at Zoosk.  We’re looking for an innovator. Someone who wants to take site operations along with a smart team of Sys Admins to the next level. This is a very hands-on leadership role in a high-availability production environment. Full details here. 
  • We need awesome people @ Booking.com - We want YOU! Come design next
    generation interfaces, solve critical scalability problems, and hack on one of the largest Perl codebases. Apply: http://www.booking.com/jobs.en-us.html
  • Teradata Aster is looking for Distributed Systems, Analytic Applications,  and Performance Architects. As a member of the Architecture Group you will help define the technical roadmap for the product.
  • The New York Times is seeking a developer focused on infrastructure to join its newsroom development team. Read the full description here and send resumes to chadas@nytimes.com.
  • New Relic is looking for a Java Scalability Engineer in Portland, OR. Ready to scale a web service with more incoming bits/second than Twitter?  http://newrelic.com/about/jobs

Fun and Informative Events

Cool Products and Services

  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free. No sign-up required. http://aicache.com/deploy
  • New Benchmark shows Aerospike nearly 10x Faster than the Competition. Thumbtack Technology YCSB Benchmark shows Aerospike nearly 20x faster than Cassandra, Couchbase and Mongodb. Read it now!
  • ScaleOut Software. In-memorry Data Grids for the Enterprise. Download a Free Trial.
  • NetDNA, a Tier-1 GlobalContent Delivery Network, offers a Dual-CDN strategy which allows companies to utilize a redundant infrastructure while leveraging the advantages of multiple CDNs to reduce costs.
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

If any of these items interest you there's a full description of each sponsor below. Please click to read more...

Click to read more ...

Monday
Jan212013

Processing 100 Million Pixels a Day - Small Amounts of Contention Cause Big Problems at Scale

This is a guest post by Gordon Worley, a Software Engineer at Korrelate, where they correlate (see what they did there) online purchases to offline purchases.

Several weeks ago, we came into the office one morning to find every server alarm going off. Pixel log processing was behind by 8 hours and not making headway. Checking the logs, we discovered that a big client had come online during the night and was giving us 10 times more traffic than we were originally told to expect. I wouldn’t say we panicked, but the office was certainly more jittery than usual. Over the next several hours, though, thanks both to foresight and quick thinking, we were able to scale up to handle the added load and clear the backlog to return log processing to a steady state...

Click to read more ...

Friday
Jan182013

Stuff The Internet Says On Scalability For January 18, 2013

Hey, it's HighScalability time:

  • 1 trillion nodes : The Near Future; 1 trillion connections : Facebook Now; 1 billion celestial objects observed : Gaia mission
  • Quotable Quotes:
    • Van Jacobson : IP started as an overlay on the phone system; today the phone system is an overlay on IP.
    • @MarkDurbin104 : Unit of Logic: a Fathom?
    • @somic : virtual infra with API is a cloud as much as a bunch of shell scripts are infra as code
    • @xaprb : I'm going to settle the argument about linear scalability once and for all. Pianos are linearly scalable. Fish are not. End of story.
    • @gigastacey : Facebook's cold storage is 1 exabyte per room with 1.5MW per room power requirement with no redundant power #ocpsummit
    • dgb75 : PHP, not my first choice but the right choice
    • @iaboyeji : Scalability is a rich man's problem
  • Joe Stump on Sprintly on the impact of reducing ORM overhead: On a primed cache, your @sprintly experience should be 100x faster now. On an unprimed cache a mere 10-15x faster.
  • Not a lot of technological detail on Facebook's new Graph Search, but here's the broad story of how it came about. Facebook has a lot of structured data about people so search needed to take advantage of that. They started in 2011 deciding to build a unified search. A quick prototype proved a proof of concept. Next they built a substring parser that could generate and rank all the potential page titles matching a query. To answer queries, with privacy filters applied, they leveraged an already existing search engine within Facebook called Unicorn. What's missing is an index of all posts and comments shared on Facebook.

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...