Information Sources

13 Comments |

Permalink |

Product,

operations

Friday

Oct242008

11 Secrets of a Cloud Scale Consultant That They Dont' Want You to Know

Friday, October 24, 2008 at 12:11AM

OK, there is no "they" and "they" wouldn't care if you knew anyway. After all, this isn't a blog about really important stuff like investing, acne cures, or cheap natural cleansing products. But the secrets are real. Super cloud scaling consultant Kent Langley has put together a comprehensive checklist to consider when developing for the cloud:

ORM for Data Partitioning and Query Splitting - Split queries between updates and deletes from the start

Monitoring process, resources, and uptime - Process Monitoring, Resource Monitoring, UpTime Monitoring

Performance Testing and Capacity Planning - Can't make good decisions without doing some degree of Performance Testing and Capacity planning.

Static vs. Dynamic Content splitting / CDN - Reverse Proxy, Splitting Static and Dynamic content

Bundling and Compressing JS and CSS - Bundle them, compress, version, and then properly cache those bundles

Logging - Log appropriately and monitor those logs

Pragmatic Caching - Most current web applications will have between 3-5 layers of caching

Functional Decomposition - Decompose your entire application into functional silos

Deployment - It should be efficient, it should have a roll back capability, and it should be almost entirely automated to development

Asynchronous Practices - Most cases work can be queued and done by a separate process

Make sure your application processes are as lean as possible - More efficient code means less servers Please follow the link to Kent's post for a full explanation. To some this may seem obvious, but that doesn't mean it gets done. Good helpful stuff.

Joyent - Cloud Computing Built on Accelerators by Kent Langley

7 Comments |

Permalink |

Strategy

Wednesday

Oct222008

Scalability Best Practices: Lessons from eBay

Wednesday, October 22, 2008 at 6:42AM

At eBay, one of the primary architectural forces we contend with every day is scalability. It colors and drives every architectural and design decision we make. With hundreds of millions of users worldwide, over two billion page views a day, and petabytes of data in our systems, this is not a choice - it is a necessity.

In a scalable architecture, resource usage should increase linearly (or better) with load, where load may be measured in user traffic, data volume, etc. Where performance is about the resource usage associated with a single unit of work, scalability is about how resource usage changes as units of work grow in number or size. Said another way, scalability is the shape of the price-performance curve, as opposed to its value at one point in that curve.

There are many facets to scalability - transactional, operational, development effort. In this article, I will outline several of the key best practices we have learned over time to scale the transactional throughput of a web-based system. Most of these best practices will be familiar to you. Some may not. All come from the collective experience of the people who develop and operate the eBay site.

Read the rest of the article on InfoQ.

mg1313 |

Permalink |

Wednesday

Oct222008

Server load balancing architectures, Part 2: Application-level load balancing

Wednesday, October 22, 2008 at 6:38AM

The transport-level server load balancing architectures described in the first half of this article are more than adequate for many Web sites, but more complex and dynamic sites can't depend on them. Applications that rely on cache or session data must be able to handle a sequence of requests from the same client accurately and efficiently, without failing. In this follow up to his introduction to server load balancing, Gregor Roth discusses various application-level load balancing architectures, helping you decide which one will best meet the business requirements of your Web site.

The first half of this article describes transport-level server load balancing solutions, such as TCP/IP-based load balancers, and analyzes their benefits and disadvantages. Load balancing on the TCP/IP level spreads incoming TCP connections over the real servers in a server farm. It is sufficient in most cases, especially for static Web sites. However, support for dynamic Web sites often requires higher-level load balancing techniques. For instance, if the server-side application must deal with caching or application session data, effective support for client affinity becomes an important consideration.

Here in Part 2, I'll discuss techniques for implementing server load balancing at the application level to address the needs of many dynamic Web sites.

Read the rest of the article on JavaWorld.

mg1313 |

Permalink |

Wednesday

Oct222008

Server load balancing architectures, Part 1: Transport-level load balancing

Wednesday, October 22, 2008 at 6:34AM

Server farms achieve high scalability and high availability through server load balancing, a technique that makes the server farm appear to clients as a single server. In this two-part article, Gregor Roth explores server load balancing architectures, with a focus on open source solutions. Part 1 covers server load balancing basics and discusses the pros and cons of transport-level server load balancing.

The barrier to entry for many Internet companies is low. Anyone with a good idea can develop a small application, purchase a domain name, and set up a few PC-based servers to handle incoming traffic. The initial investment is small, so the start-up risk is minimal. But a successful low-cost infrastructure can become a serious problem quickly. A single server that handles all the incoming requests may not have the capacity to handle high traffic volumes once the business becomes popular. In such a situations companies often start to scale up: they upgrade the existing infrastructure by buying a larger box with more processors or add more memory to run the applications.

Read the rest of the article on JavaWorld.

mg1313 |

Permalink |

Java,

Scalability,

server architecture

Wednesday

Oct222008

EVE Online Architecture

Wednesday, October 22, 2008 at 4:15AM

EVE Online is "The World's Largest Game Universe", a massively multiplayer online game (MMO) made by CCP. EVE Online's Architecture is unusual for a MMOG because it doesn't divide the player load among different servers or shards. Instead, the same cluster handles the entire EVE universe. It is an interesting to compare this with the Architecture of the Second Life Grid. How do they manage to scale?

Information Sources

Platform

Stackless Python used for both server and client game logic. It allows programmers to reap the benefits of thread-based programming without the performance and complexity problems associated with conventional threads.
SQL Server
Blade servers with SSDs for high IOPS
Plans to use Infiniband interconnects for low latency networking

What's Inside?

The Stats

Founded in 1997
~300K active users
Up to 40K concurrent users
Battles involving hundreds of ships
250M transactions per day

Architecture

Proxy Blades - These are the public facing segment of the EVE Cluster - they are responsible for taking player connections and establishing player communication within the rest of the cluster.
SOL Blades - These are the workhorses of Tranquility. The cluster is divided across 90 - 100 SOL blades which run 2 nodes each. A node is the primarily CPU intensive EVE server process running on one core. There are some SOL blades dedicated to one busy solar systems such as Jita, Motsu and Saila.
Database Cluster - This is the persistence layer of EVE Online. The running nodes interact heavily with the Database, and of course pretty much everything to do with the game lives here. Thanks to Solid-state drives, the database is able to keep up with the enormous I/O load that Tranquility generates.

Lessons Learned

With innovative ideas MMO games can scale up to the hundreds of players in the same battle.
SSDs will in fact bridge the gap huge performance gap between the memory and disks to some extent.
Low latency Infiniband network interconnect will enable larger clusters.

geekr |

10 Comments |

Permalink |

Example,

MMO,

Python,

SSD,

games

Sunday

Oct192008

Alternatives to Google App Engine

Sunday, October 19, 2008 at 5:22PM

One particularly interesting EC2 third party provider is GigaSpaces with their XAP platform that provides in memory transactions backed up to a database. The in memory transactions appear to scale linearly across machines thus providing a distributed in-memory datastore that gets backed up to persistent storage.

natis |

3 Comments |

Permalink |

Grid,

google

Friday

Oct172008

A High Performance Memory Database for Web Application Caches

Friday, October 17, 2008 at 1:22AM

Abstract—This paper presents the architecture and characteristics of a memory database intended to be used as a cache engine for web applications. Primary goals of this database are speed and efficiency while running on SMP systems with several CPU cores (four and more). A secondary goal is the support for simple metadata structures associated with cached data that can aid in efficient use of the cache. Due to these goals, some data structures and algorithms normally associated with this field of computing needed to be adapted to the new environment.

2 Comments |

Permalink |

Designing games with a purpose

Caching,

Paper

Friday

Oct172008

Scaling Spam Eradication Using Purposeful Games: Die Spammer Die!

Friday, October 17, 2008 at 1:01AM

Update: As expected I'm undergoing a massive spam attack for speaking truth to dark powers. This is the time to be strong. Together we can make a change. What change you may ask? I can't say, just change and lots more change. Let's link arms together and bravely stand against the forces of chaos for a better yesterday and a better tomorrow. CAPTCHA doesn't work. Even Google can't make CAPTCHA work (Spammers Choose GMail). And even if CAPTCHA worked it wouldn't really work because CAPTCHA solving markets (Inside India’s CAPTCHA solving economy) have evolved where for a mere $2 you can buy 1000 human broken CAPTCHA's. And we know once the free market tackles a problem that's it. Game over :-) Making ever more clever CAPTCHA programs won't outwit and outlast the CAPTCHA solving markets. Until Skynet evolves the only way to defeat humans is with humans.

Using Games to Get Humans to Do Work (like CAPTCHA) for Free

How do we harness the power of humans to do battle with the CAPTCHA solving networks, without, of course, paying them anything? We make it a game! In particular we make a Game With a Purpose (GWAP). Read all about GWAPs in Designing games with a purpose. A GWAP is a game in which people, as a side effect of playing, perform tasks computers are unable to perform.

Google's Image Labeler

A good example GWAP is Google's Image Labeler, a game in which people provide meaningful, accurate labels for images on the Web as a side effect of playing the game; for example, an image of a man and a dog is labeled "dog," "man," and "pet.". Now this sounds like work. And it is. But because it's made into a game people will do it for free! An example Labeler session looks like:

In the game two people are matched at random to label the same set of images. Points are awarded when you and your partner match labels. Top scores are kept so you can earn your label street cred. But can't people cheat? GWAP games include cheating detection mechanisms, but we won't go into detail here, see Designing games with a purpose for cheater foiling strategies.

ESP Game, Tag a Tune, and Squigl

More games can be found at the GWAP Home Page. They have the ESP Game which is like Labeler. Tag a Tune is a game where players hear tunes, describe them, and through the description guess if they are listening to the same tune. In Squigl partners see an image and a word. Using the mouse each player traces the object described by the word in the image. Winning is when both players trace the same image. Here's what a Squigl session looks like:

So you see the pattern. Players are picked from a pool. They are asked to do some task that's hard for computers to do. The task must be structured so that winning enables the system to learn something valid while providing a feeling of game play for the humans. Points are awarded and scores are kept to keep the poor human slaves playing.

Creating a Spam Catcher Game

With the basic ideas in place let's create a game for identifying and filtering out comment spam. According to Designing games with a purpose this appears to a be an output-agreement type game, which has the following structure:

Initial setup. Two strangers are randomly chosen by the game itself from among all potential players;

Rules. In each round, both are given the same input and must produce outputs based on the input. Game instructions indicate that players should try to produce the same output as their partners. Players cannot see one another's outputs or communicate with one another; and

Winning condition. Both players must produce the same output; they do not have to produce it at the same time but must produce it at some point while the input is displayed onscreen. Simple enough. But comments exist as a part of blogs, websites, microblogging engines, and other programs. Any game has to interface with live systems. Integrating the game with a comment system might work something like:

User comments are sent from an originating system to a decentralized game comment queue.

Comments are pulled from the queue as new games start. Posts are stripped of identifying information and presented to the players.

Points are allocated if both players agree that a comment is spam or not spam within a very short period of time. With comments latency is the name of the game so they need to be processed as fast as possible.

Comments and the spam judgments are sent back to the originating system for handling. It's not too hard too imagine such a system being used for content other than comments and for making judgments like age appropriateness and other subtle criteria that could be communicated using site meta data. One UI idea it to make the game like a first-person-shooter. Spam is blasted into a 1000 pieces. Oh that would be rewarding, but you can also imagine all the usual game type mechanisms to keep people interested. An accuracy feedback loop would be useful to rate players so less accurate players could be dropped from the game. Players would be recruited from the general population. Another good source of players is the site owners and the site participants who's sites are the source of comments. This would be sort of Internet Comment Tax for keeping the Internet safe and sane. I, for example, would sign up to process 500 comments a week in order to have HighScalability.com comments processed by the game. Everyone else taking advantage of the system could pledge a number that made sense for their site. This would provide a ready pool of motivated players and docents to keep the game running efficiently. A nice widget system would make it possible to play the game from any site.

The Final Move

Spam crushes many sites. Many site owners don't even allow comments anymore because of the time it takes to deal with spam, which is a shame, because without interactivity the internet might as well be a newspaper. We can't let those spammers win! A system like the Spam Catcher Game might be able provide the human oversight, quick latency, and high throughput needed to out compete the CAPTCHA solving networks. The game is finally afoot!

GWAP Home

Inside India’s CAPTCHA solving economy

Spammers Choose GMail

Google's Image Labeler

Google Crashing

7 Comments |

Permalink |

games,

spam

Wednesday

Oct152008

Oracle opens Coherence Incubator

Wednesday, October 15, 2008 at 10:05AM

During the Coherence Special Interest Group meeting in London, Brian Oliver from Oracle yesterday announced the start of the Coherence Incubator project. Coherence Incubator is a new online repository of projects that provides reference implementation examples for commonly used design patterns and integration solutions based on Oracle Coherence.

gojko |

Permalink |