Entries by HighScalability Team (1576)

Sunday
Sep052010

Hilarious Video: Relational Database vs NoSQL Fanbois

This is so funny I laughed until I cried! Definitely NSFW. OMG it's hilarious, but it's also not a bad overview of the issues. Especially loved: You read the latest post on HighScalability.com and think you are a f*cking Google and architect and parrot slogans like Web Scale and Sharding but you have no idea what the f*ck you are talking about. There are so many more gems like that.

Thanks to Alex Popescu for posting this on MongoDB is Web Scale. Whoever made this deserves a Webby.

Friday
Sep032010

Hot Scalability Links For Sep 3, 2010

 With summer almost gone, it's time to fall into some good links...

  • Hibari - distributed, fault tolerant, highly available key-value store written in Erlang. In this video Scott Lystig Fritchie gives a very good overview of the newest key-value store. 
  • Tweets of Gold
    • lenidot: with 12 staff, @tumblr serves 1.5billion pageviews/month and 25,000 signups/day. Now that's scalability!
    • jmtan24: Funny that whenever a high scalability article comes out, it always mention the shared nothing approach
    • mfeathers: When life gives you lemons, you can have decades-long conquest to convert lemons to oranges, or you can make lemonade.
    • OyvindIsene: Met an old man with mustache today, he had no opinion on #noSQL. Note to myself: Don't grow a mustache, now or later. 
    • vlad003: Isn't it interesting how P2P distributes data while Cloud Computing centralizes it? And they're both said to be the future.
  • You may be interested in a new DevOps Meetup organized by Dave Nielson, so you know it will be good.

Click to read more ...

Wednesday
Sep012010

Paper: The Case for Determinism in Database Systems  

Can you have your ACID cake and eat your distributed database too? Yes explains Daniel Abadi, Assistant Professor of Computer Science at Yale University, in an epic post, The problems with ACID, and how to fix them without going NoSQL, coauthored with Alexander Thomson, on their paper The Case for Determinism in Database Systems. We've already seen VoltDB offer the best of both worlds, this sounds like a completely different approach.

The solution, they propose, is: 

Click to read more ...

Monday
Aug302010

Pomegranate - Storing Billions and Billions of Tiny Little Files

Pomegranate is a novel distributed file system built over distributed tabular storage that acts an awful lot like a NoSQL system. It's targeted at increasing the performance of tiny object access in order to support applications like online photo and micro-blog services, which require high concurrency, high throughput, and low latency. Their tests seem to indicate it works:

We have demonstrate that file system over tabular storage performs well for highly concurrent access. In our test cluster, we observed linearly increased more than 100,000 aggregate read and write requests served per second (RPS). 

Rather than sitting atop the file system like almost every other K-V store, Pomegranate is baked into file system. The idea is that the file system API is common to every platform so it wouldn't require a separate API to use. Every application could use it out of the box.

The features of Pomegranate are:

  • It handles billions of small files efficiently, even in one directory;
  • It provide separate and scalable caching layer, which can be snapshot-able;
  • The storage layer uses log structured store to absorb small file writes to utilize the disk bandwidth;
  • Build a global namespace for both small files and large files;
  • Columnar storage to exploit temporal and spatial locality;
  • Distributed extendible hash to index metadata;
  • Snapshot-able and reconfigurable caching to increase parallelism and tolerant failures;
  • Pomegranate should be the first file system that is built over tabular storage, and the building experience should be worthy for file system community. 

Can Ma, who leads the research on Pomegranate, was kind enough to agree to a short interview.

Click to read more ...

Friday
Aug272010

OpenStack - The Answer to: How do We Compete with Amazon?

The Silicon Valley Cloud Computing Group had a meetup Wednesday on OpenStack, whose tag line is the open source, open standards cloud. I was shocked at the large turnout. 287 people registered and it looked like a large percentage of them actually showed up. I wonder, was it the gourmet pizza, the free t-shirts, or are people really that interested in OpenStack? And if they are really interested, why are they that interested? On the surface an open cloud doesn't seem all that sexy a topic, but with contributions from NASA, from Rackspace, and from a very avid user community, a lot of interest there seems to be. 

The brief intro blurb to OpenStack is:

Click to read more ...

Tuesday
Aug242010

21 Quality Screencasts on Scaling Rails

This a follow-up post to an earlier post on the Scaling Rails Screencast Series by Gregg Pollack, when there were only 13 screencasts, now there are 21. Eight more have been added on topics like load testing and database scaling. This series is of surprisingly high quality. While I didn't view every screencast, I sampled a large set and found them to have solid content and high production values. In fact, how did they make these things? The instructor moves around in a little box while the content flows around him. A very cool effect. But that wouldn't matter if the content didn't deliver, here's what's new:

Click to read more ...

Tuesday
Aug242010

Sponsored Post: deviantART, Okta, EzRez, Cloud Sigma, ManageEngine, Site24x7

Who's Hiring?

Cool Products and Services

Click to read more ...

Monday
Aug232010

6 Ways to Kill Your Servers - Learning How to Scale the Hard Way

This is a guest post by Steffen Konerow, author of the High Performance Blog.

Learning how to scale isn’t easy without any prior experience. Nowadays you have plenty of websites like highscalability.com to get some inspiration, but unfortunately there is no solution that fits all websites and needs. You still have to think on your own to find a concept that works for your requirements. So did I.

A few years ago, my bosses came to me and said “We’ve got a new project for you. It’s the relaunch of a website that has already 1 million users a month. You have to build the website and make sure we’ll be able to grow afterwards”. I was already an experienced coder, but not in these dimensions, so I had to start learning how to scale – the hard way.

Click to read more ...

Friday
Aug202010

Hot Scalability Links For Aug 20, 2010

Lots of good links this week...

  • Membase, powering Farmville's 500k operations *per second*. Of course, some people contend they could do this on their old Vic-20, but this is a useful, vigorous discussion thread on Reddit.
  • Tweets of Gold:
    • kbsingh: I dont understand why some developers think its ok to leave operations people out of scalability decisions
    • karmazilla: I find it a little odd when a database claims to support "massive scalability" when it is not distributed.
    • pcapr: OH: teenagers are eventually consistent
    • tv: Verb suggestion for the act of mapreducing data: "marinating". "Then we marinade it to get the n-gram frequencies."

    Click to read more ...

Wednesday
Aug182010

Misco: A MapReduce Framework for Mobile Systems - Start of the Ambient Cloud?

Misco: A MapReduce Framework for Mobile Systems is a very exciting paper to me because it's really one of the first explorations of some of the ideas in Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud. What they are trying to do is efficiently distribute work across a set cellphones using a now familiar MapReduce interface. Usually we think of MapReduce as working across large data center hosted clusters. Here, the cluster nodes are cellphones not contained in any data center, but compute nodes potentially distributed everywhere.

I talked briefly with Adam Dou, one of the paper's authors, and he said they don't see cellphone clusters replacing dedicated computer clusters, primarily because of the power required for both network communication and the map-reduce computations. Large multi-terabyte jobs aren't in the cards...yet. Adam estimates computationally that cellphones are performing similarly to desktops of ten years ago. Instead, they want to focus on the unique characteristics of the mobile devices--camera, microphone, GPS and other directly collectable data--so the data can be processed where collected.

MapReduce was selected as the programming interface because it is familiar to programmers, it transparently supports programming multiple devices, and can be implemented--especially using Python---in such a way that programmers are freed from all the underlying details like concurrency, data distribution, and code management. A very smart move in my estimation. 

It's interesting to contrast the economics of the ambient cloud to the economics of the data center cloud. The goal of a data center cloud is 100 percent utilization. Use every possible CPU cycle or money is being wasted money on unused equipment. In an ambient cloud the idea is more parasitic, deploy to more resources yet leave the primary function of the device unaffected. It's a different perspective that may lead to different architectures.

A quick introduction to Misco from the abstract:

Click to read more ...