Entries by HighScalability Team (1576)

Wednesday
May302012

Strategy: Get Servers for Free and Make Users Happy by Turning on Compression

Edward Capriolo has a really interesting article on his dramatic performance expanding experience of turning on compression for Cassandra. The idea:

  • Enabling compression shrunk 71GB of data down to  31GB, which caused more data to fit in RAM, which reduced disk IO to nearly nothing.
  • Compression means more data can be stored, which is like buying more machines without having to spend more money.
  • Compression means serving more data out of RAM, which means clients are happier because of the performance improvements.
  • The cost is higher CPU usage to perform the encrypt/decrypt. But disk IO is orders of magnitude slower than decompression and most servers have CPU to burn.

Edward's article is well written, has the specifics on how to turn on compression for Cassandra, pretty graphs, and lots more details.

Monday
May282012

The Anatomy of Search Technology: Crawling using Combinators

This is the second guest post (part 1, part 3) of a series by Greg Lindahl, CTO of blekko, the spam free search engine. Previously, Greg was Founder and Distinguished Engineer at PathScale, at which he was the architect of the InfiniPath low-latency InfiniBand HCA, used to build tightly-coupled supercomputing clusters.

What's so hard about crawling the web?

Web crawlers have been around as long as the Web has -- and before the web, there were crawlers for gopher and ftp. You would think that 25 years of experience would render crawling a solved problem, but the vast growth of the web and new inventions in the technology of webspam and other unsavory content results in a constant supply of new challenges. The general difficulty of tightly-coupled parallel programming also rears its head, as the web has scaled from millions to 100s of billions of pages.

Existing Open-Source Crawlers and Crawls

Click to read more ...

Friday
May252012

Stuff The Internet Says On Scalability For May 25, 2012

It's HighScalability Time:

Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
May232012

Averages, web performance data, and how your analytics product is lying to you  

This guest post is written by Josh Fraser, co-founder and CEO of Torbit. Torbit creates tools for measuring, analyzing and optimizing web performance.  

Did you know that 5% of the pageviews on Walmart.com take over 20 seconds to load? Walmart discovered this recently after adding real user measurement (RUM) to analyze their web performance for every single visitor to their site. Walmart used JavaScript to measure their median load time as well as key metrics like their 95th percentile. While 20 seconds is a long time to wait for a website to load, the Walmart story is actually not that uncommon. Remember, this is the worst 5% of their pageviews, not the typical experience.

Click to read more ...

Tuesday
May222012

Sponsored Post: Torbit, Infragistics, Velocity, Reality Check Network, Gigaspaces, AiCache, Logic Monitor, Attribution Modeling, New Relic, AppDynamics, CloudSigma, ManageEnine, Site24x7

Who's Hiring? 

  • Torbit is hiringCare about performance? Care about making the internet faster and better? At Torbit we use lots of Golang, Node.js, JavaScript and PHP to solve big challenges.

Fun and Informative Events

  • The DevOps PaaS Infusion Meetup NYC - Taking Mission-Critical Apps to the Cloud. You’ll hear real world use cases from Microsoft, Aditi, Cisco, GigaSpaces, and C24. Register here:  http://bit.ly/IpgpaN
  • O'Reilly Velocity, the Web Performance and Operations conference, is happening in Santa Clara, CA from June 25-27. Learn from your peers, exchange ideas with experts, and share best practices and lessons learned. Register here.
  • Sign up for this free 30-minute webinar exploring how new technology can determine which ads have been seen by users and will discuss the C3 Metrics Labs analysis of over 2 billion impressions. 

Cool Products and Services

  • Reality Check Network offers powerful hosting solutions and managed servers for high traffic/bandwidth websites backed by unlimited network, server and application support.
  • When you’re looking for the fastest, lightest, most complete toolset for rapidly building high performance Web 2.0 applications, you want NetAdvantage for ASP.NET.
  • Create your most stunning, highly performant, and completely mobile HTML5 applications and dashboards on any browser, platform or device – only with NetAdvantage for jQuery.
  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. Test aiCache acceleration for free.  No sign-up required. http://aicache.com/deploy
  • LogicMonitor - Hosted monitoring of your entire technology stack. Dashboards, trending graphs, alerting. Try it free and be up and running in just 15 minutes.
  • New Relic - real user monitoring optimize for humans, not bots. Live application stats, SQL/NoSQL performance, web transactions, proactive notifications. Take 2 minutes to sign up for a free trial.
  • AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
  • CloudSigma. Utility style high performance cloud servers in the US and Europe delivered on all 10GigE networking. Run any OS, take advantage of SSD storage and tailored infrastructure options.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.

For a longer description of each sponsor, please read more below...

Click to read more ...

Monday
May212012

Pinterest Architecture Update - 18 Million Visitors, 10x Growth,12 Employees, 410 TB of Data

There has been an update on Pinterest: Pinterest growth driven by Amazon cloud scalability since our last post: A Short on the Pinterest Stack for Handling 3+ Million Users.

With Pinterest we see a story very similar to that of Instagram. Huge growth, lots of users, lots of data, with remarkably few employees, all on the cloud.

While it's true that both Pinterest and Instagram are not making great advances in science and technology, that is more indicator of the easy power of today's commodity environments rather than a sign of Silicon Valley's lack of innovation. The numbers are so huge and the valuations are so high we naturally want some sort of fundamental technological revolution to underlie their growth. The revolution is more subtle. It really is just that easy to attain such growth these days, if you can execute on the right idea. Get used to it. This is the new normal.

Here's what Pinterest looks like today: 

Click to read more ...

Friday
May182012

Stuff The Internet Says On Scalability For May 18, 2012

It's HighScalability Time:

  • 42 Billion: Netflix API Requests/Month
  • Quotable quotes:
    • @commonlisp: Ideas from the talk: In Haskell laziness + thunks + garbage collection (GC) impede multicore scalability. Parallel GC is crucial.
    • @Bulldozer0: Global state is the enemy of scalability; not only in software, but in governance.
  • If you've ever, as I have, felt the terror of a misplaced "rm -rf /", you'll love this story: Did Pixar accidentally delete Toy Story 2 during production? It did, the reconstruction was heroic, and parts of Toy Story 2 were lost forever.
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...

Wednesday
May162012

Big List of 20 Common Bottlenecks

In Zen And The Art Of Scaling - A Koan And Epigram Approach, Russell Sullivan offered an interesting conjecture: there are 20 classic bottlenecks. This sounds suspiciously like the idea that there only 20 basic story plots. And depending on how you chunkify things, it may be true, but in practice we all know bottlenecks come in infinite flavors, all tasting of sour and ash.

One day Aurelien Broszniowski from Terracotta emailed me his list of bottlenecks, we cc’ed Russell in on the conversation, he gave me his list, I have a list, and here’s the resulting stone soup.

Russell said this is his “I wish I knew when I was younger" list and I think that’s an enriching way to look at it. The more experience you have, the more different types of projects you tackle, the more lessons you’ll be able add to a list like this. So when you read this list, and when you make your own, you are stepping through years of accumulated experience and more than a little frustration, but in each there is a story worth grokking.

  • Database:

Click to read more ...

Monday
May142012

DynamoDB Talk Notes and the SSD Hot S3 Cold Pattern

My impression of DynamoDB before attending a Amazon DynamoDB for Developers talk is that it’s the usual quality service produced by Amazon: simple, fast, scalable, geographically redundant, expensive enough to make you think twice about using it, and delightfully NoOp.

After the talk my impression has become more nuanced. The quality impression still stands. Look at the forums and you’ll see the typical issues every product has, but no real surprises. And as a SimpleDB++, DynamoDB seems to have avoided second system syndrome and produced a more elegant design.

What was surprising is how un-cloudy DynamoDB appears to be. The cloud pillars of pay for what you use and quick elastic response to bursty traffic have been abandoned, for some understandable reasons, but the result is you really have to consider your use cases before making DynamoDB the default choice.

Here are some of my impressions from the talk...

Click to read more ...

Friday
May112012

Stuff The Internet Says On Scalability For May 11, 2012

It's HighScalability Time:

  • 2.5M : Erlang Concurrent Connections; 20 Billion : Urban Airship Push Notifications. 
  • @agentdero: "You go to production with the code you have, not the code you wish you had" - Devops Rumsfeld
  • @PatrickMcFadin: After talking to a lot of big #aws customers tonight, the big non-secret is we'll be seeing #ssd instances soon.
  • Goodbye, CouchDB. Steven Hazel shares his experience report with CouchDB. Like many relationships it all started great, but reliability, performance, and maintenance problems drove him into the arms of Percona MySQL. They use MySQL in NoSQL mode and in return they get better performance and a love that never fails.
Don't miss all that the Internet has to say on Scalability, click below and become eventually consistent with all scalability knowledge...

Click to read more ...