Entries by geekr (38)

Wednesday
Nov192008

High Definition Video Delivery on the Web?

How would you architect and implement an SD and HD internet video delivery system such as the BBC iPlayer or Recast Digital's RDV1. What do you need to consider on top of the Lessons Learned section in the YouTube Architecture post? How is it possible to compete with the big players like Google? Can you just use a CDN and scale efficiently? Would Amazon's cloud services be a viable platform for high-definition video streaming?

Click to read more ...

Tuesday
Nov182008

Scalability Perspectives #2: Van Jacobson – Content-Centric Networking

Scalability Perspectives is a series of posts that highlights the ideas that will shape the next decade of IT architecture. Each post is dedicated to a thought leader of the information age and his vision of the future. Be warned though – the journey into the minds and perspectives of these people requires an open mind.

Van Jacobson

Van Jacobson is a Research Fellow at PARC. Prior to that he was Chief Scientist and co-founder of Packet Design. Prior to that he was Chief Scientist at Cisco. Prior to that he was head of the Network Research group at Lawrence Berkeley National Laboratory. He's been studying networking since 1969. He still hopes that someday something will start to make sense.

Scaling the Internet – Does the Net needs an upgrade?

As the Internet is being overrun with video traffic, many wonder if it can survive. With challenges being thrown down over the imbalances that have been created and their impact on the viability of monopolistic business models, the Internet is under constant scrutiny. Will it survive? Or will it succumb to the burden of the billion plus community that is constantly demanding more and more? Does the Net Need an Upgrade? To answer this question a distinguished panel of Van Jacobson, Rick Hutley, Norman Lewis, David S. Isenberg has discussed the issue on the Supernova conference. In this compelling debate available on IT Conversations, the panel addresses the question and provides some differing perspectives. One of the perspectives is Content-based networking described by Van Jacobson.

A New Way to look at Networking

Today's research community congratulates itself for the success of the internet and passionately argues whether circuits or datagrams are the One True Way. Meanwhile the list of unsolved problems grows. Security, mobility, ubiquitous computing, wireless, autonomous sensors, content distribution, digital divide, third world infrastructure, etc., are all poorly served by what's available from either the research community or the marketplace. In this amazing Google Tech Talk Van Jacobson use various strained analogies and contrived examples to argue that network research is moribund because the only thing it knows how to do is fill in the details of a conversation between two applications. Today as in the 60s problems go unsolved due to our tunnel vision and not because of their intrinsic difficulty. And now, like then, simply changing our point of view may make many hard things easy.

Content-centric networking

The founding principle of Content-centric networking is that a communication network should allow a user to focus on the data he or she needs, rather than having to reference a specific, physical location where that data is to be retrieved from. This stems from the fact that the vast majority of current Internet usage (a "high 90% level of traffic") consists of data being disseminated from a source to a number of users. The current architecture of the Internet revolves around a conversation model, created in the 1970s to allow geographically distributed users to use a few big, immobile computers. The content-centric approach seeks to make the basic architecture of the network to current usage patterns. The new approach comes with a wide range of benefits, one of which being building security (both authentication and ciphering) into the network, and at the data level. Despite all its advantages, this idea doesn't seem to map very well to some of the current uses of the Web (like web applications, where data is generated on the fly according to user actions) or real-time applications like VoIP and instant messaging. But one can envision an Internet where content-centric protocols take care of the diffusion-based uses of the network, creating an overlay network, while genuine conversation-centric protocols stay on the current infrastructure.

Solutions or workarounds?

There are many solutions or workarounds for the problems posed by traditional conversation based networking such as Content Delivery Networks, caching, distributed filesystems, P2P and PKI. By taking the perspective of Van Jacobson we can investigate new dimensions of these problems. What could be the impact of this perspective on the future of the Internet architecture? What do you think? I recommend the New Way to Look at Networking video by Van Jacobson. He tells us the brief history of Networking from the phone system to the Internet and his vision for dissemination networking.

Information Sources

Click to read more ...

Friday
Nov142008

Paper: Pig Latin: A Not-So-Foreign Language for Data Processing

Yahoo has developed a new language called Pig Latin that fit in a sweet spot between high-level declarative querying in the spirit of SQL, and low-level, procedural programming `a la map-reduce and combines best of both worlds. The accompanying system, Pig, is fully implemented, and compiles Pig Latin into physical plans that are executed over Hadoop, an open-source, map-reduce implementation. Pig has just graduated from the Apache Incubator and joined Hadoop as a subproject. The paper has a few examples of how engineers at Yahoo! are using Pig to dramatically reduce the time required for the development and execution of their data analysis tasks, compared to using Hadoop directly. References: Apache Pig Wiki

Click to read more ...

Monday
Nov102008

Scalability Perspectives #1: Nicholas Carr – The Big Switch

Scalability Perspectives is a series of posts that highlights the ideas that will shape the next decade of IT architecture. Each post is dedicated to a thought leader of the information age and his vision of the future. Be warned though – the journey into the minds and perspectives of these people requires an open mind.

Nicholas Carr

A former executive editor of the Harvard Business Review, Nicholas Carr writes and speaks on technology, business, and culture. His provocative 2004 book Does IT Matter? set off a worldwide debate about the role of computers in business.

The Big Switch – Rewiring the World, From Edison to Google

Carr's core insight is that the development of the computer and the Internet remarkably parallels that of the last radically disruptive technology, electricity. He traces the rapid morphing of electrification from an in-house competitive advantage to a ubiquitous utility, and how the business advantage rapidly shifted from the innovators and early adopters to corporate titans who made their fortune from controlling a commodity essential to everyday life. He envisions similar future for the IT utility in his new book ... and likewise all parts of the system must be constructed with reference to all other parts, since, in one sense, all the parts form one machine. - Thomas Edison Carr's vision is that IT services delivered over the Internet are replacing traditional software applications from our hard drives. We rely on the new utility grid to connect with friends at social networks, track business opportunities, manage photo collections or stock portfolios, watch videos and write blogs or business documents online. All these services hint at the revolutionary potential of the new computing grid and the information utilities that run on it. In the years ahead, more and more of the information-processing tasks that we rely on, at home and at work, will be handled by big data centers located out on the Internet. The nature and economics of computing will change as dramatically as the nature and economics of mechanical power changed with the rise of electric utilities in the early years of the last century. The consequences for society - for the way we live, work, learn, communicate, entertain ourselves, and even think - promise to be equally profound. If the electric dynamo was the machine that fashioned twentieth century society - that made us who we are - the information dynamo is the machine that will fashion the new society of the twenty-first century. The utilitarians as Carr calls them can deliver breakthrough IT economics through the use of highly efficient data centers and scalable, distributed computing, networking and storage architecture. There's a new breed of Internet company on the loose. They grow like weeds, serve millions of customers a day and operate globally. And they have very, very few employees. Look at YouTube, the video network. When it was bought by Google in 2006, for more than $1 billion, it was one of the most popular and fastest growing sites on the Net, broadcasting more than 100 million clips a day. Yet it employed a grand total of 60 people. Compare that to a traditional TV network like CBS, which has more than 23,000 employees.

Goodbye, Mr. Gates

So is the title for Chapter 4 of the book. “The Next Sea change is upon us.” Those words appeared in an extraordinary memorandum that Bill Gates sent to Microsoft's top managers and engineers on October 30, 2005. “Services designed to scale to tens or hundreds of millions [of users] will dramatically change the nature and cost of solutions deliverable to enterprise or small businesses.” This new wave, he concluded, “will be very disruptive.”

IT in 2018: From Turing’s Machine to the Computing Cloud

Carr's new internet.com eBook concludes that thanks to the theory of Alan Turing's Universal Computing Machine and the rise of modern virtualization technologies:
  • With enough memory and enough speed, Turing’s work implies, a single computer could be programmed, with software code, to do all the work that is today done by all the other physical computers in the world.
  • Once you virtualize the computing infrastructure, you can run any application, including a custom-coded one, on an external computing grid.
  • In other words: Software (coding) can always be substituted for hardware (switching).

Into the Cloud

Carr demonstrates the power of the cloud through the example of the answering machine which have been vaporized into the cloud. This is happening to our e-mails, documents, photo albums, movies, friends and world (google earth?), too. If you’re of a certain age, you’ll probably remember that the first telephone answering machine you used was a bulky, cumbersome device. It recorded voices as analog signals on spools of tape that required frequent rewinding and replacing. But it wasn’t long before you replaced that machine with a streamlined digital answering machine that recorded messages as strings of binary code, allowing all sorts of new features to be incorporated into the device through software programming. But the virtualization of telephone messaging didn’t end there. Once the device became digital, it didn’t have to be a device anymore – it could turn into a service running purely as code out in the telephone company’s network. And so you threw out your answering machine and subscribed to a service. The physical device vaporized into the “cloud” of the network.

The Great Enterprise of the 21st Century

Carr considers building scalable web sites and services a great opportunity for this century. Good news for highscalability.com :-) Just as the last century’s electric utilities spurred the development of thousands of new consumer appliances and services, so the new computing utilities will shake up many markets and open myriad opportunities for innovation. Harnessing the power of the computing grid may be the great enterprise of the twenty-first century.

Information Sources

Click to read more ...

Thursday
Oct302008

Olio Web2.0 Toolkit - Evaluate Web Technologies and Tools

How do you evaluate and decide which web technologies (and there are myriads out there) to use for your new web application, which one potentially gives you the best performance, which one will likely give you the shortest time-to-market? The Apache incubator project Olio might help. Olio is a is an open source web 2.0 toolkit to help evaluate the suitability, functionality and performance of web technologies. Olio defines an example web2.0 application (an events site somewhat like yahoo.com/upcoming) and provides three initial implementations : PHP, Java EE and RubyOnRails (ROR). The toolkit also defines ways to drive load against the application in order to measure performance. Apache Olio could be used to

  • Understand how to use various web 2.0 technologies such as AJAX, memcached, mogileFS etc. Use the code in the application to understand the subtle complexities involved and how to get around issues with these technologies.
  • Evaluate the differences in the three implementations: php, ruby and java to understand which might best work for your situation.
  • Within each implementation, evaluate different infrastructure technologies by changing the servers used (e.g: apache vs lighttpd, mysql vs postgre, ruby vs Jruby etc.)
  • Drive load against the application to evaluate the performance and scalability of the chosen platform.
  • Experiment with different algorithms (e.g. memcache locking, a different DB access API) by replacing portions of code in the application.
Olio started it's life as the web2.0kit developed by Sun Microsystems in colloboration with U.C. Berkeley RAD Lab and was presented on Velocity2008.

Click to read more ...

Wednesday
Oct222008

EVE Online Architecture

EVE Online is "The World's Largest Game Universe", a massively multiplayer online game (MMO) made by CCP. EVE Online's Architecture is unusual for a MMOG because it doesn't divide the player load among different servers or shards. Instead, the same cluster handles the entire EVE universe. It is an interesting to compare this with the Architecture of the Second Life Grid. How do they manage to scale?

Information Sources

Platform

  • Stackless Python used for both server and client game logic. It allows programmers to reap the benefits of thread-based programming without the performance and complexity problems associated with conventional threads.
  • SQL Server
  • Blade servers with SSDs for high IOPS
  • Plans to use Infiniband interconnects for low latency networking
  • What's Inside?

    The Stats

    • Founded in 1997
    • ~300K active users
    • Up to 40K concurrent users
    • Battles involving hundreds of ships
    • 250M transactions per day

    Architecture

    The EVE Cluster is broken into 3 distinct layers
    • Proxy Blades - These are the public facing segment of the EVE Cluster - they are responsible for taking player connections and establishing player communication within the rest of the cluster.
    • SOL Blades - These are the workhorses of Tranquility. The cluster is divided across 90 - 100 SOL blades which run 2 nodes each. A node is the primarily CPU intensive EVE server process running on one core. There are some SOL blades dedicated to one busy solar systems such as Jita, Motsu and Saila.
    • Database Cluster - This is the persistence layer of EVE Online. The running nodes interact heavily with the Database, and of course pretty much everything to do with the game lives here. Thanks to Solid-state drives, the database is able to keep up with the enormous I/O load that Tranquility generates.

    Lessons Learned

    There are many interesting facts about the architecture of the EVE Online MMOG such as the use of Stacless Python and SSDs.
    • With innovative ideas MMO games can scale up to the hundreds of players in the same battle.
    • SSDs will in fact bridge the gap huge performance gap between the memory and disks to some extent.
    • Low latency Infiniband network interconnect will enable larger clusters.
    Check out the information sources for detailed insights to the development and operation of the EVE Online game.

    Click to read more ...

Monday
Oct062008

Paper: Scaling Genome Sequencing - Complete Genomics Technology Overview

Although the problem of scaling human genome sequencing is not exactly about building bigger, faster and more reliable websites it is most interesting in terms of scalability. The paper describes a new technology by the startup company Complete Genomics to sequence the full human genome for the fraction of the cost of earlier possibilities. Complete Genomics is building the world’s largest commercial human genome sequencing center to provide turnkey, outsourced complete human genome sequencing to customers worldwide. By 2010, their data center will contain approximately 60,000 processors with 30 petabytes of storage running their sequencing software on Linux clusters. Do you find this interesting and relevant to HighScalability.com?

Click to read more ...

Sunday
Oct052008

Paper: Scalability Design Patterns

I have introduced pattern languages in my earlier post on The Pattern Bible for Distributed Computing. Achieving highest possible scalability is a complex combination of many factors. This PLoP 2007 paper presents a pattern language that can be used to make a system highly scalable. The Scalability Pattern Language introduced by Kanwardeep Singh Ahluwalia includes patterns to:

  • Introduce Scalability
  • Optimize Algorithm
  • Add Hardware
  • Add Parallelism
    • Add Intra-Process Parallelism
    • Add Inter-Porcess Parallelism
    • Add Hybrid Parallelism
  • Optimize Decentralization
  • Control Shared Resources
  • Automate Scalability

Click to read more ...

Wednesday
Oct012008

The Pattern Bible for Distributed Computing

Software design patterns are an emerging tool for guiding and documenting system design. Patterns usually describe software abstractions used by advanced designers and programmers in their software. Patterns can provide guidance for designing highly scalable distributed systems. Let's see how! Patterns are in essence solutions to problems. Most of them are expressed in a format called Alexandrian form which draws on constructs used by Christopher Alexander. There are variants but most look like this:

  • The pattern name
  • The problem the pattern is trying to solve
  • Context
  • Solution
  • Examples
  • Design rationale: This tells where the pattern came from, why it works, and why experts use it
Patterns rarely stand alone. Each pattern works on a context, and transforms the system in that context to produce a new system in a new context. New problems arise in the new system and context, and the next ‘‘layer’’ of patterns can be applied. A pattern language is a structured collection of such patterns that build on each other to transform needs and constraints into an architecture. The latest POSA book Pattern-Oriented Software Architecture Volume 4: A Pattern Language for Distributed Computing will guide the readers through the best practices and introduce them to key areas of building distributed software systems using patterns. The book pulls together 114 patterns and shows how to use them in the context of distributed software architectures. Although somewhat theoretical it is still a great resource for practicing distributed-systems architects. It is as close as you're going to get to a one-stop "encyclopedia" of patterns relevant to distributed computing. However it is not a true encyclopedia since "over 150" patterns are referenced but not described in POSA Volume 4. The book does not go into the details of the pattern's implementations, so the reader should already be familiar with the patterns, or be prepared to spend some time researching. The pattern language for distributed computing includes patterns such as:
  • Broker
  • Client-Dispatcher-Server
  • Pipes and Filters
  • Leaders/Followers
  • Reactor
  • Proactor
Patterns can indeed be useful in designing highly scalable systems and solving various problems related to concurrency, synchronization and resource management and other topics. Wikipedia has more details on Pattern languages to check out.

Click to read more ...

Tuesday
Sep232008

Event: CloudCamp Silicon Valley Unconference on 30th September

CloudCamp is an interesting unconference where early adapters of Cloud Computing technologies exchange ideas. With the rapid change occurring in the industry, we need a place we can meet to share our experiences, challenges and solutions. At CloudCamp, you are encouraged you to share your thoughts in several open discussions, as we strive for the advancement of Cloud Computing. End users, IT professionals and vendors are all encouraged to participate. CloudCamp Silicon Valley 08 is scheduled for Tuesday, September 30, 2008 from 06:00 PM - 10:00 PM in Sun Microsystems' EBC Briefing Center 15 Network Circle Menlo Park, CA 94025 CloudCamp follows an interactive, unscripted unconference format. You can propose your own session or you can attend a session proposed by someone else. Either way, you are encouraged to engage in the discussion and “Vote with your feet”, which means … “find another session if you don’t find the session helpful”. Pick and choose from the conversations; rant and rave, or sit back and watch. At CloudCamp, we tend to discuss the following topics: * Infrastructure as a service (Joyent, Amazon Ec2, Nirvanix, etc) * Platform as a service (BungeeLabs, AppEngine, etc) * Software as a service (salesforce.com) * Application / Data / Storage (development in the cloud)

Click to read more ...