Entries by HighScalability Team (1576)

Friday
Apr082011

Stuff The Internet Says On Scalability For April 8, 2011

Submitted for your reading pleasure on this tomato killing frosty morn...

For the rest of the Stuff the Internet Says please read on...

Click to read more ...

Thursday
Apr072011

Paper: A Co-Relational Model of Data for Large Shared Data Banks

Let's play a quick game of truth or sacrilage: are SQL and NoSQL are really just two sides of the same coin? That's what Erik Meijer and Gavin Bierman would have us believe in their "we can all get along and make a lot of money" article in the Communications of the ACM, A Co-Relational Model of Data for Large Shared Data Banks. You don't believe it? It's math, so it must be true :-) Some key points:

In this article we present a mathematical data model for the most common noSQL databases—namely, key/value relationships—and demonstrate that this data model is the mathematical dual of SQL's relational data model of foreign-/primary-key relationships

...we believe that our categorical data-model formalization and monadic query language will allow the same economic growth to occur for coSQL key-value stores.

...In contrast to common belief, the question of big versus small data is orthogonal to the question of SQL versus coSQL. While the coSQL model naturally supports extreme sharding, the fact that it does not require strong typing and normalization makes it attractive for "small" data as well. On the other hand, it is possible to scale SQL databases by careful partitioning.
What this all means is that coSQL and SQL are not in conflict, like good and evil. Instead they are two opposites that coexist in harmony and can transmute into each other like yin and yang. Because of the common query language based on monads, both can be implemented using the same principles.

I'm certainly in no position to judge this work, or what it means at some deep level. After reading a 1000 treatments on monads I still have no idea what they are. But, like the Standard Model in physics, it would be satisfying if some unifying principles underlay all this stuff. Would we all get along? That's a completely different question...

Wednesday
Apr062011

Netflix: Run Consistency Checkers All the time to Fixup Transactions

You might have consistency problems if you have: multiple datastores in multiple datacenters, without distributed transactions, and with the ability to alternately execute out of each datacenter;  syncing protocols that can fail or sync stale data; distributed clients that cache data and then write old back to the central store; a NoSQL database that doesn't have transactions between updates of multiple related key-value records; application level integrity checks; client driven optimistic locking.

Sounds a lot like many evolving, loosely coupled, autonomous, distributed systems these days. How do you solve these consistency problems? Siddharth "Sid" Anand of Netflix talks about how they solved theirs in his excellent presentation, NoSQL @ Netflix : Part 1, given to a packed crowd at a Cloud Computing Meetup

You might be inclined to say how silly it is to have these problems in the first place, but just hold on. See if you might share some of their problems, before getting all judgy:

Click to read more ...

Friday
Apr012011

Stuff The Internet Says On Scalability For April 1, 2011

Submitted for your reading pleasure, no foolin'...

  • Quotable Quotes:
    • @zateriosystems: thinking about scalability?, are you OK to double your capacity in one week?, a startup should be ready...ready to jump.
    • @sklacy: Maybe what I should have said is "Design for scalability, deploy without it."
    • @MikeHale: Scalability is customer 2000 having the same experience as customer 1 #sqlsat67
    • @LusciousPear: The meaning of #NoSQL is shut up
    • @deobrat: The biggest bottleneck to scalability are ignorant developers. Most don't even try saving extra CPU cycles or memory bytes :(
    • @w_westendorp: .@ijansch: Cloud computing is like outsourcing your scalability problems
    • @edyavno: Billy Newport essentially just affirmed the theme I've been propagating: "Distributed Caching is the enterprise NoSQL" #strangeloop #nosql
    • @monkchips: HP CEO Leo Apotheker says "relational databases are becoming less and less relevant to the future stack"
    For more Stuff the Internet says please keep on reading...

    Click to read more ...

Thursday
Mar312011

8 Lessons We Can Learn from the MySpace Incident - Balance, Vision, Fearlessness

A surprising amount of heat and light was generated by the whole Micrsoft vs MySpace discussion. Why people feel so passionate about this I'm not quite sure, but fortunately for us, in the best sense of the web, it generated an amazing number of insightful comments and observations. If we stand back and take a look at the whole incident, what can we take a way that might help us in the future?

Click to read more ...

Tuesday
Mar292011

Sponsored Post: OPOWER, Data 2.0, ClearStone, Schooner, deviantART, ScaleOut, aiCache, WAPT, Karmasphere, Kabam, Newrelic, Cloudkick, Membase, Joyent, CloudSigma, ManageEngine, Site24x7

Who's Hiring?

Fun and Informative Events

  • Interested in CouchDB Training? The CouchDB Training World Tour starts this month with new CouchDB training classes in five major cities.
  • Data 2.0 is the next step in the evolution of the Internet, creating new channels for facilitating connectivity and communication between websites. 

Cool Products and Services

  • APM (Application Performance Management) for NOSQL, Java and More - Try ClearStone 5.0. Download ClearStone 5.0 today!  http://www.evidentsoftware.com/download/
  • Schooner delivers true enterprise-grade performance and availability with Membrain: a flash-optimized memcached cache and NoSQL data store for x86 servers. Download a free trial copy today!
  • ScaleOut StateServer - Scale Out Your Server Farm Applications!
  • aiCache creates a better user experience by increasing the speed scale and stability of your web-site. 
  • WAPT is a load, stress and performance testing tool for websites and web-based applications.
  • Karmasphere is bringing Apache Hadoop power to developers and analysts. Download your Free Community Edition today!
  • Newrelic - What are you doing to ensure the performance of your apps?
  • Cloudkick - monitor & manage your servers better with a FREE Cloudkick developer account.
  • Learn how two game developers prepared for rapid user growth in this recorded Joyent webinar: http://bit.ly/hzBoib.
  • CloudSigma. Instantly scalable European cloud servers.
  • ManageEngine Applications Manager : Monitor physical, virtual and Cloud Applications.
  • www.site24x7.com : Monitor End User Experience from a global monitoring network.
For more information on each sponsor please see below...

Click to read more ...

Monday
Mar282011

Aztec Empire Strategy: Use Dual Pipes in Your Aqueduct for High Availability

With the Chapultepec aqueduct, also named the great aqueduct, the Aztecs built a novel uninterruptible water supply for providing fresh water to Tenochtitlan, their fast growing jewel of a capital city. A section of the aqueduct is still around today: 

It's fun to think about how even 600 years ago how it was built with high availability in mind. We find engineers being engineers, no matter the age:

Click to read more ...

Friday
Mar252011

Did the Microsoft Stack Kill MySpace?

Robert Scoble wrote a fascinating case study, MySpace’s death spiral: insiders say it’s due to bets on Los Angeles and Microsoft, where he reports MySpace insiders blame the Microsoft stack on why they lost the great social network race to Facebook.  

Does anyone know if this is true? What's the real story?

I was wondering because it doesn't seem to track with the MySpace Architecture post that I did in 2009, where they seem happy with their choices and had stats to back up their improvements. Why this matters is it's a fascinating model for startups to learn from. What does it really take to succeed? Is it the people or the stack? Is it the organization or the technology? Is it the process or the competition? Is the quality of the site or the love of the users? So much to consider and learn from.

Some conjectures from the article:

Click to read more ...

Thursday
Mar242011

Strategy: Disk Backup for Speed, Tape Backup to Save Your Bacon, Just Ask Google

In Stack Overflow Architecture Update - Now At 95 Million Page Views A Month, a commenter expressed surprise about Stack Overflow's backup strategy: 

Backup is to disk for fast retrieval and to tape for historical archiving.

The comment was:

Really? People still do this? I know some organizations invested a tremendous amount in automated, robotic tape backup, but seriously, a site founded in 2008 is backing up to tape?

The Case of the Missing Gmail Accounts

I admit that I was surprised at this strategy too. In this age of copying data to disk three times for safety, I also wondered if tape backups were still necessary? Then, like in a movie, an event happened that made sense of everything, Google suffered the quintessential #firstworldproblem, gmail accounts went missing! Queue emphatic music. And what's more they were taking a long time to come back. There was a palpable fear in the land that email accounts might never be restored. Think about that. They might never be restored...

Click to read more ...

Tuesday
Mar222011

Facebook's New Realtime Analytics System: HBase to Process 20 Billion Events Per Day

Facebook did it again. They've built another system capable of doing something useful with ginormous streams of realtime data. Last time we saw Facebook release their New Real-Time Messaging System: HBase To Store 135+ Billion Messages A Month. This time it's a realtime analytics system handling over 20 billion events per day (200,000 events per second) with a lag of less than 30 seconds

Alex Himel, Engineering Manager at Facebook, explains what they've built (video) and the scale required:

Social plugins have become an important and growing source of traffic for millions of websites over the past year. We released a new version of Insights for Websites last week to give site owners better analytics on how people interact with their content and to help them optimize their websites in real time. To accomplish this, we had to engineer a system that could process over 20 billion events per day (200,000 events per second) with a lag of less than 30 seconds. 

Alex does an excellent job with the presentation. Highly recommended. But let's take a little deeper look at what's going on...

Click to read more ...