High Scalability -

Entries by Todd Hoff (380)

Thursday

Oct182007

another approach to replication

Thursday, October 18, 2007 at 12:42PM

File replication based on erasure codes can reduce total replicas size 2 times and more.

Click to read more ...

Todd Hoff |

4 Comments |

Permalink |

Print Article

Email Article

replication

Tuesday

Oct162007

How Scalable are Single Page Ajax Apps?

Tuesday, October 16, 2007 at 6:06AM

I've been using GWT for an application and I get the same feeling using it that I first got using html. I've always sucked at building UIs. Starting with programming HP terminals, moving on to the Apple Lisa, then X Windows, and Microsoft Windows, I just never had IT, whatever IT is. On the Beauty and the Geek scale my interfaces are definitely horned-rimmed and pocket protector friendly. Html helped free me from all that to just build stuff that worked, but didn't have to look all that great. Expectations were pretty low and I eagerly fulfilled them. With Ajax expectations have risen again and I find myself once more easily identifiable as a styless geek. Using GWT I have some hopes I can suck a little less. In working with GWT I was so focussed on its tasty easily digestible Ajaxy goodness, I didn't stop to think about the topic of this site: scalability. When I finally brought my distracted mind around to consider the scalability of the single page webs site I was building, I became a bit concerned. Many of the strategies that are typically used to achieve scalability don't seem to apply in single page land. Here are the issues I see. Maybe you can tell me where I am off in my analysis?

Plus: a lot of state is maintained in the client. You don't need to keep session state on the server side. This is a win because you aren't slamming the database to reconstitute state. It's cached on the client. After more consideration it seems this is not always the case. Take your typical shopping cart scenario. You have the old problem of not storing prices in the client so some evil Mallory can attack your system by changing prices. And my shopping cart must outlast my browser session so its still there when I return. I would be heart broken if my carefully crafted Amazon cart disappeared every time Firefox went away. So server side state is often still necessary. Yet a lot of state is kept on the client side and that's a better thing.

Plus: a lot of business logic in on the client. The client can do a lot of the work which saves making calls to the server. An interesting comparison of the effects of Ajax on business logic partitioning is Google Calendar: Not As Fat as Other Ajax Apps by Dietrich Kappe.

Minus: Can't offload searching. The lack of a proper link structure means your site can't be spidered, which means it can't be searched. One useful scalability strategy is to offload search to something like Google's Custom Search Engine, not for the ad revenue (because there's little), but because it means I don't have to devote any resources to searching. That's a huge win.

Minus: SEO problems suck up developer time. The common response to the previously mentioned search engine optimization (SEO) problems are to make a shadow text site or insert hidden divs. But that's a lot of pretty useless effort. I would like to spend my time elsewhere.

Minus: Can't load balance static content from the client. RPC is used to slurp up data from the server and these requests must go back to the originating domain. This counters one common strategy of using a CDN and/or multiple host names for serving content so you can trick your browser into starting multiple simultaneous connections to different hosts when loading page content. This speeds up your site and spreads the load across different servers. Using RPC to serve content seems to lose this advantage.

Minus: Ajax calls add server load. You buy into that with Ajax, but it's still a concern, especially if you have to poll frequently for updates. Dietrich found that the Ajax requests may not be that much smaller than before, so you can't depend on smaller work loads to make up for the increased number of calls. See Yahoo Mail, Ajax and Your Server.

Minus: Lack of monetary scalability with AdSense. Without a page to parse AdSense can't figure out which ads to display on your site. So one common monetization strategy isn't open to you.

Unsure: When using a caching proxy like Squid, a major scalability strategy, is my cacheable content effectively cached when using RPC? I couldn't find a resolution to this issue. One solution around many of these problems is to use a combination of REST and JASONP. This converts your client into a big mashup, even if all the parts you are mashing are your own. And this approach makes a lot of sense to me, but then I don't really see the purpose of having a RPC mechanism. There are surely issues I've missed and misunderstood, but it seems single page apps present some distinct scalability challenges. Your thoughts would be appreciated.

Click to read more ...

Todd Hoff |

7 Comments |

Permalink |

Print Article

Email Article

AJAX,

GWT

Sunday

Oct142007

Product: The Spread Toolkit

Sunday, October 14, 2007 at 9:10AM

Complex applications coordinating work across a lot of machines often need a highly performing fault tolerant message layer. Though a blast to write, it's probably a better use of your time to use an off the shelf solution. And that's where Spread comes in. Flickr, for example, uses Spread to create real-time event feeds from their web server logs. What exactly is Spread? From the Spread website:

Spread is an open source toolkit that provides a high performance messaging service that is resilient to faults across local and wide area networks. Spread functions as a unified message bus for distributed applications, and provides highly tuned application-level multicast, group communication, and point to point support. Spread services range from reliable messaging to fully ordered messages with delivery guarantees. Spread can be used in many distributed applications that require high reliability, high performance, and robust communication among various subsets of members. The toolkit is designed to encapsulate the challenging aspects of asynchronous networks and enable the construction of reliable and scalable distributed applications. Some of the services and benefits provided by Spread:
Reliable and scalable messaging and group communication.
A very powerful but simple API simplifies the construction of distributed architectures.
Easy to use, deploy and maintain.
Highly scalable from one local area network to complex wide area networks.
Supports thousands of groups with different sets of members.
Enables message reliability in the presence of machine failures, process crashes and recoveries, and network partitions and merges.
Provides a range of reliability, ordering and stability guarantees for messages.
Emphasis on robustness and high performance.
Completely distributed algorithms with no central point of failure.

In Building Scalable Web Sites Cal Henderson describes how Flickr uses Spread to create a log of real-time events, like photos uploaded and discussions started, as they happen. Spread is connected to their web servers. As photos are uploaded these web server events are messaged in real-time to agents consuming the feed. The advantage of this architecture is it sheds load away from the database. Otherwise the database would have to be continuously polled for new events by each agent.

LAMP and the Spread Toolkit

The Spread Toolkit: Architecture and Performance

Click to read more ...

Todd Hoff |

11 Comments |

Permalink |

Print Article

Email Article

Messaging,

Product,

Strategy,

flickr

Thursday

Oct112007

How Flickr Handles Moving You to Another Shard

Thursday, October 11, 2007 at 3:25AM

Colin Charles has cool picture showing Flickr's message telling him they'll need about 15 minutes to move his 11,500 images to another shard. One, that's a lot of pictures! Two, it just goes to show you don't have to make this stuff complicated. Sure, it might be nice if their infrastructure could auto-balance shards with no down time and no loss of performance, but do you really need to go to all the extra complexity? The manual system works and though Colin would probably like his service to have been up, I am sure his day will still be a pleasant one.

Click to read more ...

Todd Hoff |

3 Comments |

Permalink |

Print Article

Email Article

Strategy,

flickr

Wednesday

Oct102007

WAN Accelerate Your Way to Lightening Fast Transfers Between Data Centers

Wednesday, October 10, 2007 at 3:18PM

How do you keep in sync a crescendo of data between data centers over a slow WAN? That's the question Alberto posted a few weeks ago. Normally I'm not into all boy bands, but I was frustrated there wasn't a really good answer for his problem. It occurred to me later a WAN accelerator might help turn his slow WAN link into more of a LAN, so the overhead of copying files across the WAN wouldn't be so limiting. Many might not consider a WAN accelerator in this situation, but since my friend Damon Ennis works at the WAN accelerator vendor Silver Peak, I thought I would ask him if their product would help. Not surprisingly his answer is yes! Potentially a lot, depending on the nature of your data. Here's a no BS overview of their product:

What is it? - Scalable WAN Accelerator from Silver Peak (http://www.silver-peak.com)

What does it do? - You can send 5x-100x times more data across your expensive, low-bandwidth WAN link.

Why should you care? - Your data centers become more like co-located real-time peers. - You can sync a lot more media and other large files across data centers. 50x improvement in data replication performance over a WAN. - You may be able to operate on remote database more like a local database. 5x-20x improvement is SQL data manipulation and unique query performance. - A 2 hour database backup would take 4 minutes. 10x-30x improvement in transferring large data sets over SQL. A good disaster planning feature.

How does it work? - You buy an accelerator appliance for both sides of you link. All your WAN traffic flows through these boxes. - The appliances then use various techniques to effectively decrease latency and increase bandwidth across the link: -- Traffic reduction. Accelerators look for patterns in data across a link, caching the data on either side of the link, and then not sending the data when similar patterns are seen again. This can lead to a 90% reduction in traffic. -- Compression. Data are compressed across the link. compression ratios from 0 to 2-5x are seen, depending on the content type. -- TCP Manipulation. The TCP/IP protocol is gamed to yield better performance. For example, a proxy on both sides is used to get a bigger window size. -- Application Manipulation. Various application protocols, like CIFS, NFS, and Outlook, can be gamed to improve performance.

How much does it cost? - $10k to $130k per box. $10k for the 2Mbps appliance and $130k for the 500Mbps. - They are the scale leaders and are specifically good at "high-end" (> 50Mbps) replication.

Who uses it? - Fidelity Bank, Ernst & Young, Panasonic.

Is it for real? - Yes. It works and is installed and running in many data centers.

How do you get it? - Contact sales at http://www.silver-peak.com/Contact/contact.asp.

Where do you go for more information? - White paper Directory - http://www.silver-peak.com/InfoCenter/index.htm#whitepapers - Understanding WAN Acceleration Techniques - http://www.silver-peak.com/assets/download/pdf/technologydescriptions.pdf

Is there anything else interesting you should know? - The appliance performs encryption and compression so you don't need perform those functions on your own CPUs. - The appliances fail to wire so if a box fails traffic passes unaccelerated. If you can't live with that you need to buy 2 boxes per end of the link (4 boxes total).

How much will you benefit? - The more duplication in your data the better job they can do. There's tons of duplicated data in a database feed , for example, so they can really help supercharge database performance. - Latency/time improvements depend on the link. The higher the latency the link has the less bandwidth you can use. For example, a 100ms link is limited to 5Mbps throughput per flow due to the TCP window size (64KB/100ms ~ 5Mbps). They can take this to several hundred Mbps per flow. - Image files are often pre-compressed. As compression removes duplicate information they can't be as efficient at the de-duplication as in other scenarios, though they can still improve throughput. An interesting side-effect of speeding up the WAN link is that it often reveals bottlenecks in other parts of the system. A slow WAN might be hiding:

Underpowered servers. Servers that could process a trickle of data may be overwhelmed by a flood of data.

Slow applications. Apps that could pump data at slow WAN speeds may not be able drive a faster WAN. You may need to take a look at your software architecture or storage network.

Underpowered server links. Accelerate a 2mbps link to a 20mbps link and your network infrastructure on the data center side may not be able to handle the truth. Obviously the cost of the solution means its targeted more for moderate sized companies or a service provider offering their customers a quality upsell. But if you are stuck wondering how the heck you are going to squeeze more bits between your data centers, it may be just the magic bullet you need.

Click to read more ...

Todd Hoff |

Paper: Understanding and Building High Availability/Load Balanced Clusters

Monday, October 8, 2007 at 10:30PM

A superb explanation by Theo Schlossnagle of how to deploy a high availability load balanced system using mod backhand and Wackamole. The idea is you don't need to buy expensive redundant hardware load balancers, you can make use of the hosts you already have to the same effect. The discussion of using peer-based HA solutions versus a single front-end HA device is well worth the read. Another interesting perspective in the document is to view load balancing as a resource allocation problem. There's also a nice discussion of the negative of effect of keep-alives on performance.

Click to read more ...

Todd Hoff |

3 Comments |

Permalink |

Print Article

Email Article

Load Balancing,

Paper

Monday

Oct082007

Lessons from Pownce - The Early Years

Monday, October 8, 2007 at 10:35AM

Pownce is a new social messaging application competing micromessage to micromessage with the likes of Twitter and Jaiku. Still in closed beta, Pownce has generously shared some of what they've learned so far. Like going to a barrel tasting of a young wine and then tasting the same wine after some aging, I think what will be really interesting is to follow Pownce and compare the Pownce of today with the Pownce of tomorrow, after a few years spent in the barrel. What lessons lie in wait for Pownce as they grow? Site: http://www.pownce.com/

Information Sources

Pownce Lessons Learned - FOWA 2007

Scoble on Twitter vs Pownce

Founder Leah Culver's Blog

The Platform

Python

Django for the website framework

Amazon's S3 for file storage.

Adobe AIR (Adobe Integrated Runtime) for desktop application

Memcached

Available on Facebook

Timeplot for charts and graphs.

The Stats

Developed in 4 months and went to an invite-only launch in June.

Began as Leah's hobby project and then it snowballed into a real horse with the addition of Digg's Daniel Burka and Kevin Rose.

Small 4 person team with one website developer.

Self funded.

One MySQL database.

Features include: - Short messaging, invites for events, links, file sharing (you can attach mp3s to messages, for example). - You can limit usage to a specific subset of friends and friends can be grouped in sets. So you can send your mp3 to a specific group of friends. - It does not have an SMS gateway, IM gateway, or an API.

The Architecture

Chose Django because it had an active community, good documentation, good readability, it is open to growth, and auto generated administration.

Chose S3 because it minimized maintenance and was inexpensive. It has been reliable for them.

Chose AIR because it had a lot of good buzz, ease of development, creates a nice UI, and is cross platform.

Database has been the main bottleneck. Attack and fix slow queries.

Static pages, objects, and lists are cached using memcached.

Queuing is used to defer more complex work, like sending notes, until later.

Use pagination and a good UI to limit the amount of work performed.

Good indexing helped improve the performance for friend searching.

In a social site: - Make it easy to create and destroy relationships. - Friend relationships are the most important information to display correctly because people really care about it. - Friends in the online world have real-world effects.

Features are "biased" for scalability - You must get an invite from someone on already on Pownce. - Invites are limited to their data center's ability to keep up with the added load. Blindly uploading address books can bring on new users exponentially. Limiting that unnatural growth is a good idea.

Their feature set will expand but they aren't ready to commit to an API yet.

Revenue model: ads between posts.

Lessons Learned

The four big lessons they've experienced so far are: - Think about technology choices. - Do a lot with a little. - Be kind to your database. - Expect anything.

Have a small dedicated team where people handle multiple jobs.

Use open source. There's lots of it, it's free, and there's a lot of good help.

Use your resources. Learn from website doc, use IRC, network, participate in communities and knowledge exchange.

Shed work off the database by making sure that complex features are really needed before implementing them.

Cultivate a prepared mind. Expect the unexpected and respond quickly to the inevitable problems.

Use version control and make backups.

Maintain a lot of performance related stats.

Don't promise users a deadline because you just might not make it.

Commune with your community. I especially like this one and I wish it was done more often. I hope this attitude can survive growth. - Let them know what you are working on and about new features and bug fixes. - Respond personally to individual bug creators.

Take a look at your framework's automatically generated queries. They might suck.

A sexy UI and a good buzz marketing campaign can get you a lot of users.

Scaling Twitter: Making Twitter 10000 Percent Faster.

Click to read more ...

Todd Hoff |

7 Comments |

Permalink |

Django,

Python,

Sunday

Oct072007

Paper: Architecture of a Highly Scalable NIO-Based Server

Sunday, October 7, 2007 at 4:53PM

The article describes the basic architecture of a connection-oriented NIO-based java server. It takes a look at a preferred threading model, Java Non-blocking I/O and discusses the basic components of such a server.

Click to read more ...

Todd Hoff |

5 Comments |

Permalink |

Print Article

Email Article

Java server socket nio Scalable nonblocking,

Paper

Sunday

Oct072007

Product: Wackamole

Sunday, October 7, 2007 at 1:45AM

Wackamole is an application that helps with making a cluster highly available. It manages a bunch of virtual IPs, that should be available to the outside world at all times. Wackamole ensures that a single machine within a cluster is listening on each virtual IP address that Wackamole manages. If it discovers that particular machines within the cluster are not alive, it will almost immediately ensure that other machines acquire these public IPs. At no time will more than one machine listen on any virtual IP. Wackamole also works toward achieving a balanced distribution of number IPs on the machine within the cluster it manages. There is no other software like Wackamole. Wackamole is quite unique in that it operates in a completely peer-to-peer mode within the cluster. Other products that provide the same high-availability guarantees use a "VIP" method. Wackamole is an application that runs as root in a cluster to make it highly available. It uses the membership notifications provided by the Spread toolkit to generate a consistent state that is agreed upon among all of the connected Wackamole instances. Wackamole is released under the CNDS Open Source License. Note: This post has been adapted from the linked to web site.

White paper on building HA/LB Clusters by Theo Schlossnagle.

Click to read more ...

Todd Hoff |

7 Comments |

Permalink |

Print Article

Email Article

Load Balancing,

Product

Thursday

Oct042007

You Can Now Store All Your Stuff on Your Own Google Like File System

Thursday, October 4, 2007 at 3:38AM

New update: Parascale’s CTO on what’s different about Parascale. Let's say you have gigglebytes of data to store and you aren't sure you want to use a CDN. Amazon's S3 doesn't excite you. And you aren't quite ready to join the grid nation. You want to keep it all in house. Wouldn't it be nice to have something like the Google File System you could use to create a unified file system out of all your disks sitting on all your nodes? According to Robin Harris, a.k.a StorageMojo (a great blog BTW), you can now have your own GFS: Parascale launches Google-like storage software. Parascale calls their softwate a Virtual Storage Network (VSN). It "aggregates disks across commodity Linux x86 servers to deliver petabyte-scale file storage. With features such as automated, transparent file replication and file migration, Parascale eliminates storage hotspots and delivers massive read/write bandwidth." Why should you care? I don't know about you, but the "storage problem" is one the most frustrating parts of building websites. There's never a good answer that is affordable. Should you build a SAN or a NAS? How do you make it redundant? How do you make it perform? How do you back it up? How do you grow it without a defense appropriations sized budget? Should you use RAID? Which level and where for what reason? Should you use SCSI, iSCSI, SAS, SATA, or alpha beta? Which vendor should you use? There are so many conflicting opinions about everything. It's all a confusing mess to me. So I like the simplicity of buying commodity nodes with just a bunch of disks attached. But the question has always been how do you turn all those disks into a unified storage system without writing a ton of software on top? Harris says this is what Parascale has done for you:

VSN, like GFS, builds availability and scalability around low-cost servers and disks. NAS appliances rely on costly low-volume boxes that are closed and don't scale. GFS has been deployed in production clusters of over 5,000 servers, proving the scalability of the architecture. Fast, reliable, low-cost and massively scalable storage powers the growth of new applications like Web 2.0, video-on-demand, and hi-resolution image archiving. Parascale is the first of a new generation of software-only storage solutions.

They make a big deal out of it being a software only system. Harris says why this is a good thing:

I like software-based systems because hardware is a commodity. When you create custom hardware you also create low-volume, high-cost components whose economics go from bad to worse. If you *need* to do it, then go for it. But data is getting cooler and the requirement for specialized high-performance hardware is shrinking relative to the market.

Other systems use an appliance model. Appliances can add a lot of value, but they are also a way of monetizing you. A software system on commodity hardware has the potential to give good value. Will it? I didn't see pricing so it's hard to tell. Even odder is their pricing model. You are leasing the software per year, per disk spindle. Do you have any idea how much this will cost? Neither do I. I sounds like it could be horribly expensive or really reasonable. We'll have to see. Another thing that bothers me is that you can't run a database on top of their file system. This means I need an entire separate storage system for my database. You can run a database on a NAS or SAN, so this is a definite disadvantage. Anyway, it's just another interesting option to consider when architecting your website.

LiveJournal created an open source distributed file system called MogileFS that builders may find useful.

Parascale Announces Industry's First Software-Only Storage Solution for Digital Content

Click to read more ...

Todd Hoff |

3 Comments |

Permalink |

Print Article

Email Article

Clustered Storage System,

Product