High Scalability -

Thursday

May202010

Strategy: Scale Writes to 734 Million Records Per Day Using Time Partitioning

Thursday, May 20, 2010 at 6:43AM

In Scaling writes in MySQL (slides) Philip Tellis, while working for Yahoo, describes how using time based partitions they were able to increase their write capability from 2100 inserts per second (7 million a day) to a sustained 8500 inserts per second (734 million a day). This was capacity enough to handle the load during Michael Jackson's memorial service. In summary, the secrets to scalable writes are:

Click to read more ...

HighScalability Team |

4 Comments |

Permalink |

Print Article

Email Article

MySQL,

Strategy

Monday

May172010

7 Lessons Learned While Building Reddit to 270 Million Page Views a Month

Monday, May 17, 2010 at 8:05AM

Steve Huffman, co-founder of social news site Reddit, gave an excellent presentation (slides, transcript) on the lessons he learned while building and growing Reddit to 7.5 million users per month, 270 million page views per month, and 20+ database servers.

Steve says a lot of the lessons were really obvious, so you may not find a lot of completely new ideas in the presentation. But Steve has an earnestness and genuineness about him that is so obviously grounded in experience that you can't help but think deeply about what you could be doing different. And if Steve didn't know about these lessons, I'm betting others don't either.

There are seven lessons, each has their own summary section: Lesson one: Crash Often; Lesson 2: Separation of Services; Lesson 3: Open Schema; Lesson 4: Keep it Stateless; Lesson 5: Memcache; Lesson 6: Store Redundant Data; Lesson 7: Work Offline.

By far the most surprising feature of their architecture is in Lesson Six, whose essential idea is:

Click to read more ...

HighScalability Team |

28 Comments |

Permalink |

Print Article

Email Article

Example,

queue

Friday

May142010

Hot Scalability Links for May 14, 2010

Friday, May 14, 2010 at 7:59AM

Lots of good ones this week...

Scalability, Availability & Stability Patterns. Jonas Boner has 197 slides covering a very wide range of scalability topics. One stop scalability shopping.
Horizontal Scalability via Transient, Shardable, and Share-Nothing Resources. Heroku's Adam Wiggins shares what they've learned about scaling based on their experiences building a cloud platform and the hundreds of apps running on it. He describes the next generation architecture he thinks all software should follow in the future.
Scalability of the Hadoop Distributed File System. Konstantin V. Shvachko writes a great post analyzing if the limitations imposed on a distributed file system by the single-node namespace server architecture can support 100,000 clients and petabytes of files.
Cassandra by Example. Eric Evans created a nice Cassandra tutorial using building a Twitter clone as an example. Many people want to see more data modeling examples. Here you are.

Click to read more ...

HighScalability Team |

1 Comment |

Permalink |

Print Article

Email Article

hot links

Wednesday

May122010

The Rise of the Virtual Cellular Machines

Wednesday, May 12, 2010 at 8:44AM

My apologies if you were looking for a post about cell phones. This post is about high density nanodevices. It's a follow up to How will memristors change everything? for those wishing to pursue these revolutionary ideas in more depth. This is one of those areas where if you are in the space then there's a lot of available information and if you are on the outside then it doesn't even seem to exist. Fortunately, Ben Chandler from The SyNAPSE Project, was kind enough to point me to a great set of presentations given at the 12th IEEE CNNA - International Workshop on Cellular Nanoscale Networks and their Applications - Towards Megaprocessor Computing. WARNING: these papers contain extreme technical content. If you are like me and you aren't an electrical engineer, much of it may make a sort of surface sense, but the deep and twisty details will fly over head. For the more software minded there are a couple more accessible presentations:

Intelligent Machines built with Memristive Nanodevices by Greg Snider of Hewlett-Packard Laboratories.
Virtual and Physical Cellular Architectures for Kilo-processor Chip Computers by Prof. Tamas Roska of SZTAKI & Pazmany University, Budapest.

Here a few excerpts from the presentations, just things I found particularly interesting. I'm still trying to make sense of it all and I thought you might be interested too. It's clear there's something new here and it will require different algorithms and programming models to work. What will those be and who will invent them?

Click to read more ...

HighScalability Team |

1 Comment |

Permalink |

Print Article

Email Article

Monday

May102010

Sify.com Architecture - A Portal at 3900 Requests Per Second

Monday, May 10, 2010 at 7:44AM

Sify.com is one of the leading portals in India. Samachar.com is owned by the same company and is one of the top content aggregation sites in India, primarily targeting Non-resident Indians from around the world. Ramki Subramanian, an Architect at Sify, has been generous enough to describe the common back-end for both these sites. One of the most notable aspects of their architecture is that Sify does not use a traditional database. They query Solr and then retrieve records from a distributed file system. Over the years many people have argued for file systems over databases. Filesystems can work for key-value lookups, but they don't work for queries, using Solr is a good way around that problem. Another interesting aspect of their system is the use of Drools for intelligent cache invalidation. As we have more and more data duplicated in multiple specialized services, the problem of how to keep them synchronized is a difficult one. A rules engine is a clever approach.

Click to read more ...

HighScalability Team |

18 Comments |

Permalink |

Print Article

Email Article

Example

Thursday

May062010

Going global on EC2

Thursday, May 6, 2010 at 8:24PM

Since its inception, Amazon EC2 has enabled companies to run highly scalable infrastructure with minimal overhead. Over the years, Amazon Web Services has expanded with new offerings and additional regions around the world.

All this growth has made establishing a global footprint easier than ever. And yet, most EC2 customers still choose to operate in a single region. While this is fine for many applications, customers with significant web infrastructure are depriving users of drastically improved performance. Deploying infrastructure in EC2's new regions cuts out one of the biggest sources of latency: distance.

In this post, I describe how Bizo significantly reduced load times by implementing Global Server Load Balancing (GSLB) to distribute traffic across all Amazon regions.

Click here to read more on Bizo's dev blog

Mike Babineau |

1 Comment |

Permalink |

Print Article

Email Article

tagged

aws,

bizo,

dns,

ec2,

gslb

Wednesday

May052010

How will memristors change everything?

Wednesday, May 5, 2010 at 7:36AM

A non-random sample of my tech friends shows that not many have heard of memristors (though I do suspect vote tampering). I'd read a little about memristors in 2008 when the initial hubbub about the existence of memristors was raised. I, however, immediately filed them into that comforting conceptual bucket of potentially revolutionary technologies I didn't have to worry about because like most wondertech, nothing would ever come of it. Wrong. After watching Finding the Missing Memristor by R. Stanley Williams I've had to change my mind. Memristors have gone from "maybe never" to holy cow this could happen soon and it could change everything.

Let's assume for the sake of dreaming memristors do prove out. How will we design systems when we have access to a new material that is two orders of magnitude more efficient from a power perspective than traditional transistor technologies, contains multiple petabits (1 petabit = 128TB) of persistent storage, and can be reconfigured to be either memory or CPU in a package as small as a sugar cube (in a stacked configuration)?

Click to read more ...

HighScalability Team |

24 Comments |

Permalink |

memristor

Tuesday

May042010

Business continuity with real-time data integration

Tuesday, May 4, 2010 at 4:16AM

Enterprises want to protect their data. As the appetite for data volumes grows, storage technology becomes a critical business asset on which business continuity relies. My recent survey in the medium-size enterprise segment shows the five dominant investment directions at the level of data management architecture: disaster recovery (DR), high availability (HA), backup, data processing performance and migration to more advanced databases.

This suggests that corporations generally have sufficiently structured data collections but are concerned with business continuity and continuous availability of data. What infrastructures can provide these assurances? In this post I want to focus on yet another option, and that is the Real-Time Data Integration model. As an example I am going to discuss Oracle GoldenGate, which permits you to manage the data critical to your business in safety, ensuring business continuity without disruption even if the data is distributed among multiple, heterogeneous business applications and architectures.

MocoSpace Architecture - 3 Billion Mobile Page Views a Month

Monday, May 3, 2010 at 7:23AM

This is a guest post by Jamie Hall, Co-founder & CTO of MocoSpace, describing the architecture for their mobile social network. This is a timely architecture to learn from as it combines several hot trends: it is very large, mobile, and social. What they think is especially cool about their system is: how it optimizes for device/browser fragmentation on the mobile Web; their multi-tiered, read/write, local/distributed caching system; selecting PostgreSQL over MySQL as a relational DB that can scale.

MocoSpace is a mobile social network, with 12 million members and 3 billion page views a month, which makes it one of the most highly trafficked mobile Websites in the US. Members access the site mainly from their mobile phone Web browser, ranging from high end smartphones to lower end devices, as well as the Web. Activities on the site include customizing profiles, chat, instant messaging, music, sharing photos & videos, games, eCards and blogs. The monetization strategy is focused on advertising, on both the mobile and Websites, as well as a virtual currency system and a handful of premium feature upgrades.

Stats

Click to read more ...

HighScalability Team |

7 Comments |

Permalink |

Print Article

Email Article

Example,

Postgres

Monday

May032010

100 Node Hazelcast cluster on Amazon EC2

Monday, May 3, 2010 at 7:00AM

Deploying, running and monitoring application on a big cluster is a challenging task. Recently Hazelcast team deployed a demo application on Amazon EC2 platform to show how Hazelcast p2p cluster scales and screen recorded the entire process from deployment to monitoring.

Hazelcast is open source (Apache License), transactional, distributed caching solution for Java. It is a little more than a cache though as it provides distributed implementation of map, multimap, queue, topic, lock and executor service.

Details of running 100 node Hazelcast cluster on Amazon EC2 can be found here. Make sure to watch the screencast!

Talip Ozturk |

1 Comment |

Permalink |

Print Article

Email Article

tagged

Distributed Computing,

Hazelcast,

in-memory in

Hazelcast,

distributed caching