High Scalability -

Entries by HighScalability Team (1576)

Monday

May172010

7 Lessons Learned While Building Reddit to 270 Million Page Views a Month

Monday, May 17, 2010 at 8:05AM

Steve Huffman, co-founder of social news site Reddit, gave an excellent presentation (slides, transcript) on the lessons he learned while building and growing Reddit to 7.5 million users per month, 270 million page views per month, and 20+ database servers.

Steve says a lot of the lessons were really obvious, so you may not find a lot of completely new ideas in the presentation. But Steve has an earnestness and genuineness about him that is so obviously grounded in experience that you can't help but think deeply about what you could be doing different. And if Steve didn't know about these lessons, I'm betting others don't either.

There are seven lessons, each has their own summary section: Lesson one: Crash Often; Lesson 2: Separation of Services; Lesson 3: Open Schema; Lesson 4: Keep it Stateless; Lesson 5: Memcache; Lesson 6: Store Redundant Data; Lesson 7: Work Offline.

By far the most surprising feature of their architecture is in Lesson Six, whose essential idea is:

Click to read more ...

HighScalability Team |

28 Comments |

Permalink |

Print Article

Email Article

Example,

queue

Friday

May142010

Hot Scalability Links for May 14, 2010

Friday, May 14, 2010 at 7:59AM

Lots of good ones this week...

Scalability, Availability & Stability Patterns. Jonas Boner has 197 slides covering a very wide range of scalability topics. One stop scalability shopping.
Horizontal Scalability via Transient, Shardable, and Share-Nothing Resources. Heroku's Adam Wiggins shares what they've learned about scaling based on their experiences building a cloud platform and the hundreds of apps running on it. He describes the next generation architecture he thinks all software should follow in the future.
Scalability of the Hadoop Distributed File System. Konstantin V. Shvachko writes a great post analyzing if the limitations imposed on a distributed file system by the single-node namespace server architecture can support 100,000 clients and petabytes of files.
Cassandra by Example. Eric Evans created a nice Cassandra tutorial using building a Twitter clone as an example. Many people want to see more data modeling examples. Here you are.

Click to read more ...

HighScalability Team |

1 Comment |

Permalink |

Print Article

Email Article

hot links

Wednesday

May122010

The Rise of the Virtual Cellular Machines

Wednesday, May 12, 2010 at 8:44AM

My apologies if you were looking for a post about cell phones. This post is about high density nanodevices. It's a follow up to How will memristors change everything? for those wishing to pursue these revolutionary ideas in more depth. This is one of those areas where if you are in the space then there's a lot of available information and if you are on the outside then it doesn't even seem to exist. Fortunately, Ben Chandler from The SyNAPSE Project, was kind enough to point me to a great set of presentations given at the 12th IEEE CNNA - International Workshop on Cellular Nanoscale Networks and their Applications - Towards Megaprocessor Computing. WARNING: these papers contain extreme technical content. If you are like me and you aren't an electrical engineer, much of it may make a sort of surface sense, but the deep and twisty details will fly over head. For the more software minded there are a couple more accessible presentations:

Intelligent Machines built with Memristive Nanodevices by Greg Snider of Hewlett-Packard Laboratories.
Virtual and Physical Cellular Architectures for Kilo-processor Chip Computers by Prof. Tamas Roska of SZTAKI & Pazmany University, Budapest.

Here a few excerpts from the presentations, just things I found particularly interesting. I'm still trying to make sense of it all and I thought you might be interested too. It's clear there's something new here and it will require different algorithms and programming models to work. What will those be and who will invent them?

Click to read more ...

HighScalability Team |

1 Comment |

Permalink |

Print Article

Email Article

Monday

May102010

Sify.com Architecture - A Portal at 3900 Requests Per Second

Monday, May 10, 2010 at 7:44AM

Sify.com is one of the leading portals in India. Samachar.com is owned by the same company and is one of the top content aggregation sites in India, primarily targeting Non-resident Indians from around the world. Ramki Subramanian, an Architect at Sify, has been generous enough to describe the common back-end for both these sites. One of the most notable aspects of their architecture is that Sify does not use a traditional database. They query Solr and then retrieve records from a distributed file system. Over the years many people have argued for file systems over databases. Filesystems can work for key-value lookups, but they don't work for queries, using Solr is a good way around that problem. Another interesting aspect of their system is the use of Drools for intelligent cache invalidation. As we have more and more data duplicated in multiple specialized services, the problem of how to keep them synchronized is a difficult one. A rules engine is a clever approach.

Click to read more ...

HighScalability Team |

18 Comments |

Permalink |

Print Article

Email Article

Example

Wednesday

May052010

How will memristors change everything?

Wednesday, May 5, 2010 at 7:36AM

A non-random sample of my tech friends shows that not many have heard of memristors (though I do suspect vote tampering). I'd read a little about memristors in 2008 when the initial hubbub about the existence of memristors was raised. I, however, immediately filed them into that comforting conceptual bucket of potentially revolutionary technologies I didn't have to worry about because like most wondertech, nothing would ever come of it. Wrong. After watching Finding the Missing Memristor by R. Stanley Williams I've had to change my mind. Memristors have gone from "maybe never" to holy cow this could happen soon and it could change everything.

Let's assume for the sake of dreaming memristors do prove out. How will we design systems when we have access to a new material that is two orders of magnitude more efficient from a power perspective than traditional transistor technologies, contains multiple petabits (1 petabit = 128TB) of persistent storage, and can be reconfigured to be either memory or CPU in a package as small as a sugar cube (in a stacked configuration)?

Click to read more ...

HighScalability Team |

24 Comments |

Permalink |

memristor

Monday

May032010

MocoSpace Architecture - 3 Billion Mobile Page Views a Month

Monday, May 3, 2010 at 7:23AM

This is a guest post by Jamie Hall, Co-founder & CTO of MocoSpace, describing the architecture for their mobile social network. This is a timely architecture to learn from as it combines several hot trends: it is very large, mobile, and social. What they think is especially cool about their system is: how it optimizes for device/browser fragmentation on the mobile Web; their multi-tiered, read/write, local/distributed caching system; selecting PostgreSQL over MySQL as a relational DB that can scale.

MocoSpace is a mobile social network, with 12 million members and 3 billion page views a month, which makes it one of the most highly trafficked mobile Websites in the US. Members access the site mainly from their mobile phone Web browser, ranging from high end smartphones to lower end devices, as well as the Web. Activities on the site include customizing profiles, chat, instant messaging, music, sharing photos & videos, games, eCards and blogs. The monetization strategy is focused on advertising, on both the mobile and Websites, as well as a virtual currency system and a handful of premium feature upgrades.

Stats

Click to read more ...

HighScalability Team |

7 Comments |

Permalink |

Print Article

Email Article

Example,

Postgres

Friday

Apr302010

Hot Scalability Links for April 30, 2010

Friday, April 30, 2010 at 7:56AM

I Want a New Data Store. Jeremy Zawodny of Craigslist wants a new database, one that can do what it should: perform alter table operations faster, has efficient queries when most of the data is on disk and not in RAM, and matches their data that now looks more document oriented than relational. A lot of people willing to help.
Computer Science Unplugged. An extensive collection of free resources that teach principles of Computer Science such as binary numbers, algorithms and data compression through engaging games and puzzles that use cards, string, crayons and lots of running around. And it's free! Fascinating Interview with Tim Bell on teaching complex computing concepts, creating makers not just users, and how to change schools. From O'Reilly Radar.
Akamai’s Network Now Pushes Terabits of Data Every Second. Akamai handles 12 million requests per second, logs more than 500 billion requests for content per day, and sends 3.45 terabits per second of data.

Click to read more ...

HighScalability Team |

Product: SciDB - A Science-Oriented DBMS at 100 Petabytes

Thursday, April 29, 2010 at 6:42AM

Scientists are doing it for themselves. Doing what? Databases. The idea is that most databases are designed to meet the needs of businesses, not science, so scientists are banding together at scidb.org to create their own Domain Specific Database, for science. The goal is to be able to handle datasets in the 100PB range and larger.

SciDB, Inc. is building an open source database technology product designed specifically to satisfy the demands of data-intensive scientific problems. With the advice of the world's leading scientists across a variety of disciplines including astronomy, biology, physics, oceanography, atmospheric sciences, and climatology, our computer scientists are currently designing and prototyping this technology

The scientists that are participating in our open source project believe that the SciDB database — when completed — will dramatically impact their ability to conduct their experiments faster and more efficiently and further improve the quality of life on our planet by enabling them to run experiments that were previously impossible due to the limitations of existing database systems and infrastructure. Many of the world's leading computer scientists with expertise in database systems have contributed to the design and architecture of the system to meet the needs of the world's scientists.

SciDB looks like a cool project and follows what might be considered a trend, instead of beating a general tool into submission, build a specialized tool that does what you need it to do. More details about SciDB can be found in the paper A Demonstration of SciDB: A Science-Oriented DBMS. A nice succinct poster is available summarizing the product.

Some interesting bits from the paper:

Click to read more ...

HighScalability Team |

4 Comments |

Permalink |

Print Article

Email Article

Product,

nosql

Tuesday

Apr272010

Paper: Dapper, Google's Large-Scale Distributed Systems Tracing Infrastructure

Tuesday, April 27, 2010 at 7:25AM

Imagine a single search request coursing through Google's massive infrastructure. A single request can run across thousands of machines and involve hundreds of different subsystems. And oh by the way, you are processing more requests per second than any other system in the world. How do you debug such a system? How do you figure out where the problems are? How do you determine if programmers are coding correctly? How do you keep sensitive data secret and safe? How do ensure products don't use more resources than they are assigned? How do you store all the data? How do you make use of it?

That's where Dapper comes in. Dapper is Google's tracing system and it was originally created to understand the system behaviour from a search request. Now Google's production clusters generate more than 1 terabyte of sampled trace data per day. So how does Dapper do what Dapper does?

Click to read more ...

HighScalability Team |

Sponsored Post: Event - Social Developer Summit

Tuesday, April 20, 2010 at 7:25AM

Social Developer Summit - June 29, 2010 - San Franciso, CA

A meeting of the technically social - Building, scaling, and profiting in a social age

Whether it's social games, social news, social discovery, social search, or other forms of social solutions, developers today are facing new hurdles in building instantly scalable products. As new technologies emerge to address the challenges faced by social application developers, it's increasingly important to come together for knowledge sharing purposes.

Click to read more ...

HighScalability Team |