High Scalability -

3 Comments |

Permalink |

cloud,

disruption

Monday

Aug012011

Peecho Architecture - scalability on a shoestring

Monday, August 1, 2011 at 9:01AM

This is a guest post by Marcel Panse and Sander Nagtegaal from Peecho.

Although architecture descriptions are an interesting read, the problems that start-ups face are hardly ever addressed. We would like to change that, so here is our architecture story.

Introducing a start-up

Peecho

The Amsterdam-based company Peecho offers print-as-a-service. Our embeddable print button allows you to sell your digital content as professionally printed products, like photo books, magazines or canvases - straight from your own website. There is an API, too.

Printcloud is the system that powers the print button. It exists in the cloud only, growing when needed and becoming smaller if it can. The system takes in print orders, magically transforms tough data into print-ready files and routes the orders to the production facility that is closest to the intended recipient.

To preserve the environment, Peecho's philosophy is to facilitate global ordering, but to aim for local production only.

Expensive stuff does not scale

9 Comments |

Permalink |

AWS,

Example

Friday

Jul292011

Stuff The Internet Says On Scalability For July 29, 2011

Friday, July 29, 2011 at 9:14AM

Submitted for your end of July scaling pleasure:

YouTube: 3 billion videos viewed a day; 48 hours of footage uploaded every minute. 64 core Tilera chip.
Google wants to be your CDN. They figure the only way to make the web faster...is to host it. Page Speed Service - Web Performance, Delivered. An eventually for pay service that caches your website and distributes it around the world. No cost information. Your speed may vary. See the longish list of limitations.
Nobody said anything interesting on scalability this week! A disaster of non-quotable proportions. If I missed something, now is your chance.
Moving an Elephant: Large Scale Hadoop Data Migration at Facebook. Paul Yang describes the greatest westward expansion since the land bridge across the Bering Strait. It's a story of moving a 30PB Hadoop cluster from an over populated datacenter to the wide open spaces of a new continent. Unlike the early settlers, Facebook did not move the boxes over, that would disrupt service, they instead mirrored the data to their new datacenter.

For a lot more Stuff the Internet said on scalability, please keep on reading below...

Permalink |

hot links

Wednesday

Jul272011

Making Hadoop 1000x Faster for Graph Problems

Wednesday, July 27, 2011 at 9:07AM

Dr. Daniel Abadi, author of the DBMS Musings blog and Cofounder of Hadapt, which offers a product improving Hadoop performance by 50x on relational data, is now taking his talents to graph data in Hadoop's tremendous inefficiency on graph data management (and how to avoid it), which shares the secrets of getting Hadoop to perform 1000x better on graph data.

TL;DR:

4 Comments |

Permalink |

Hadoop,

Paper,

graph

Tuesday

Jul262011

Who's Hiring?

BetterWorks is hiring a PHP Software Engineer in Los Angeles to help make enterprise software be as beautiful and usable as an Apple product. Please apply here.
TripAdvisor is Hiring Engineers at all Levels: Scalable Web Engineering Program. To apply for our Scalable Web Engineering Program, visit http://www.tripadvisor.com/careers/webprogram
Are you a scalability expert? eHarmony is looking for Senior Java Engineers to help implement and scale our Matching compatibility systems. Please visit: http://tinyurl.com/3g8mxks.
Aconex is looking for a Systems Engineer in San Bruno. Please apply here.
MathWorks Looking for Multiple, Full-time Scaling Experts. Apply now: http://matlab.my/lVmunb

Fun and Informative Events

NoSQL Now! is a new conference covering the dynamic field of NoSQL technologies. August 23-25 in San Jose. For more information please visit: http://www.NoSQLNow.com
Surge 2011: The Scalability and Performance Conference. Surge is a chance to identify emerging trends and meet the architects behind established technologies. Early Bird Registration.
Join our webinar as we introduce Tungsten Enterprise Summer '11 Edition with improved usability, performance and ease of management for MySQL and PostgreSQL clusters.
Couchbase is having a Special Offer for Apache CouchDB Developer Training! http://www.couchbase.com/couchdb-training/portland-june-2011

Cool Products and Services

New Relic - real user monitoring optimize for humans, not bots. Live application stats, SQL/NoSQL performance, web transactions, proactive notifications. Take 2 minutes to sign up for a free trial.
AppDynamics is the very first free product designed for troubleshooting Java performance while getting full visibility in production environments. Visit http://www.appdynamics.com/free.
ScaleOut StateServer - Scale Out Your Server Farm Applications!
CloudSigma. Instantly scalable European cloud servers.
ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
www.site24x7.com : Monitor End User Experience from a global monitoring network.

For a longer description of each sponsor, please read more below...

Permalink |

Is NoSQL a Premature Optimization that's Worse than Death? Or the Lady Gaga of the Database World?

Monday, July 25, 2011 at 9:03AM

Michael Stonebraker sure knows how to stir up a storm. Unlike for others, that doesn't make him a troll in my mind, he's way too accomplished in the field to be that, but he does have a bit of Barnum & Bailey in him, which serves to get the discussion flowing, and that's a good thing. A lot of previously hidden wisdom and passion unlocks, which we'll try to capture here.

This disturbance in the force is over OldSQL vs NoSQL vs NewSQL. Warning, these are not crisp categories, there's leakage all over the place, watch your step:

OldSQL (Oracle, MySQL, etc) refers to what some want to term as legacy relational database like MySQL, that don't scale out horizontally with aplomb.
NoSQL (CouchDB, Redis, Cassandra, HBase, MongoDB, Riak, Neo4j, etc) refers to, well, a collection of technologies that aren't OldSQL, these often are designed to scale out horizontally, aren't on ACID, and use schemaless non-relational datamodels.
NewSQL (Xeround, Clustrix, NimbusDB, GenieDB, ScaleBase, VoltDB) are databases that preserve SQL, the relational model, ACID, schemas, and are scalable, though not necessarily horizontally (which I don't quite understand). Sharding should be transparent. The general pitch is once you have ACIDy SQL goodness and elasticity, all on commodity hardware, then there's no reason to use NoSQL.

OK, got it? Then you might be the only one...

The disturbance first started with this article by Derrick Harris, which gets a lot of mileage out of a few quotes by Stonebraker. The short of it is:

4 Comments |

Permalink |

nosql

Friday

Jul222011

Stuff The Internet Says On Scalability For July 22, 2011

Friday, July 22, 2011 at 8:55AM

Submitted for your scaling pleasure:

Google's PageRank involves 500 million variables and 2 billion terms. SeaMicro Packs 768 Cores Into its Atom Server. Twitter: 1 Billion Items Delivered A Day Is Nice, Google+. We Do 350 Billion.
Potent quotables:
- @merv - Does elastic scalability matter? Ask Apple about 1 million e-transactions in a day. $30M worth of multi-gig downloads that must work.
- @lmacvittie - Cloud has shifted the focus of scalability from applications to architecture.
- @bartbohn - Love the phrase "anarchic scalability" by Roy Fielding
- @iAjayMe - Programming: I have come to the conclusion, if you want to do something right the first time (scalability,performance) write it in C++

For a lot more Stuff the Internet says, please read more below...

Permalink |

hot links

Wednesday

Jul202011

Netflix: Harden Systems Using a Barrel of Problem Causing Monkeys - Latency, Conformity, Doctor, Janitor, Security, Internationalization, Chaos

Wednesday, July 20, 2011 at 9:21AM

With a new Planet of the Apes coming out, this may be a touchy subject with our new overlords, but Netflix is using a whole lot more trouble injecting monkeys to test and iteratively harden their systems. We learned previously how Netflix used Chaos Monkey, a tool to test failover handling by continuously failing EC2 nodes. That was just a start. More monkeys have been added to the barrel. Node failure is just one problem in a system. Imagine a problem and you can imagine creating a monkey to test if your system is handling that problem properly. Yury Izrailevsky talks about just this approach in this very interesting post: The Netflix Simian Army.

I know what you are thinking, if monkeys are so great then why has Netflix been down lately. Dmuino addressed this potential embarrassment, putting all fears of cloud inferiority to rest:

Unfortunately we're not running 100% on the cloud today. We're working on it, and we could use more help. The latest outage was caused by a component that still runs in our legacy infrastructure where we have no monkeys :)

To continuously test the resilience of Netflix's system to failures, they've added a number of new monkeys, and even a gorilla:

4 Comments |

Permalink |

Strategy,

netflix

Monday

Jul182011

New Relic Architecture - Collecting 20+ Billion Metrics a Day

Monday, July 18, 2011 at 9:01AM

This is a guest post by Brian Doll, Application Performance Engineer at New Relic.

New Relic’s multitenant, SaaS web application monitoring service collects and persists over 100,000 metrics every second on a sustained basis, while still delivering an average page load time of 1.5 seconds. We believe that good architecture and good tools can help you handle an extremely large amount of data while still providing extremely fast service. Here we'll show you how we do it.

New Relic is Application Performance Management (APM) as a Service
In-app agent instrumentation (bytecode instrumentation, etc.)
Support for 5 programming languages (Ruby, Java, PHP, .NET, Python)
175,000+ app processes monitored globally
10,000+ customers

The Stats

7 Comments |

Permalink |

Example

Friday

Jul152011

Stuff The Internet Says On Scalability For July 15, 2011

Friday, July 15, 2011 at 9:18AM

Submitted for your scaling pleasure:

That's a lot of data...CERN: ATLAS produces up to 320M bytes per second, followed by CMS with 220M Bps. Amazon Cloud Now Stores 339 Billion Objects. CERN also has an open source hardware effort.
Domas Mituzas on why Facebook may just outlast their MySQL heritage: I feel somewhat sad that I have to put this truism out here: disks are way more cost efficient, and if used properly can be used to facilitate way more long-term products, not just real time data. Think Wikipedia without history, think comments that disappear on old posts, together with old posts, think all 404s you hit on various articles you remember from the past and want to read. Building the web that lasts is completely different task from what academia people imagine building the web is. What happens in real world if one gets 2x efficiency gain? Twice more data can be stored, twice more data intensive products can be launched.
Quotes that are quoted because they are quotable:
- @Werner - If you have never developed anything of that scale you cannot be taken serious if you call for the reengineering of facebook's data store
- @Werner - Acaling data systems in real life has humbled me. I would not dare criticize an architecture that the holds social graphs of 750M and works
- Dwight Merriman - I'm not smart enough to do distributed joins that scale horizontally, widely, and are super fast. You have to choose something else. We have no choice but to not be relational.

For a lot more Stuff the Internet says, please read below...

Permalink |