Entries by geekr (38)

Monday
Sep082008

Guerrilla Capacity Planning and the Law of Universal Scalability

In the era of Web 2.0 traditional approaches to capacity planning are often difficult to implement. Guerrilla Capacity Planning facilitates rapid forecasting of capacity requirements based on the opportunistic use of whatever performance data and tools are available. One unique Guerrilla tool is Virtual Load Testing, based on Dr. Gunther's "Universal Law of Computational Scaling", which provides a highly cost-effective method for assessing application scalability. Neil Gunther, M.Sc., Ph.D. is an internationally recognized computer system performance consultant who founded Performance Dynamics Company in 1994. Some reasons why you should understand this law: 1. A lot of people use the term "scalability" without clearly defining it, let alone defining it quantitatively. Computer system scalability must be quantified. If you can't quantify it, you can't guarantee it. The universal law of computational scaling provides that quantification. 2. One the greatest impediments to applying queueing theory models (whether analytic or simulation) is the inscrutibility of service times within an application. Every queueing facility in a performance model requires a service time as an input parameter. No service time, no queue. Without the appropriate queues in the model, system performance metrics like throughtput and response time, cannot be predicted. The universal law of computational scaling leapfrogs this entire problem by NOT requiring ANY low-level service time measurements as inputs. The universal scalability model is a single equation expressed in terms of two parameters α and β. The relative capacity C(N) is a normalized throughput given by: C(N) = N / ( 1 + αN + βN (N − 1) ) where N represents either: 1. (Software Scalability) the number of users or load generators on a fixed hardware configuration. In this case, the number of users acts as the independent variable while the CPU configuration remains constant for the range of user load measurements. 2. (Hardware Scalability) the number of physical processors or nodes in the hardware configuration. In this case, the number of user processes executing per CPU (say 10) is assumed to be the same for every added CPU. Therefore, on a 4 CPU platform you would run 40 virtual users. with `α' (alpha) the contention parameter, and `β' (beta) the coherency-delay parameter. This model has wide-spread applicability, including:

  • Accounts for such effects as VM thrashing, and cache-miss latencies.
  • Can also be used to model disk arrays, SANs, and multicore processors.
  • Can also be used to model certain types of network I/O
  • The user-load form is the most common application of eqn.
  • Can be used in combination with measurement tools like LoadRunner, Benchmark Factory, etc.
Gunther's book: Guerrilla Capacity Planning: A Tactical Approach to Planning for Highly Scalable Applications and Services and its companion Analyzing Computer Systems Performance: With Perl: PDQ offers practical advise on capacity planning. Distilled notes of his Guerilla Capacity Planning (GCaP) classes are available online in The Guerilla Manual.

Click to read more ...

Sunday
Aug172008

Wuala - P2P Online Storage Cloud

How do you design a reliable distributed file system when the expected availability of the individual nodes are only ~1/5? That is the case for P2P systems. Dominik Grolimund, the founder of a Swiss startup Caleido will show you how! They have launched Wuala, the social online storage service which scales as new nodes join the P2P network. The goal of Wua.la is to provide distributed online storage that is:

  • large
  • scalable
  • reliable
  • secure
by harnessing the idle resources of participating computers. This challenge is an old dream of computer science. In fact as Andrew Tanenbaum wrote in 1995: "The design of a world-wide, fully transparent distributed filesystem fot simultaneous use by millions of mobile and frequently disconnected users is left as an exercise for the reader" After three years of research and development at at ETH Zurich, the Swiss Federal Institute of Technology on a distributed storage system, Caleido is ready to unveil the result: Wuala. Wuala is a new way of storing, sharing, and publishing files on the internet. It enables its users to trade parts of their local storage for online storage and it allows us to provide a better service for free. In this Google Tech Talk, Dominik will explain what Wuala is and how it works, and he will also show a demo. The availability problem is solved by redundancy (just like in Google File System). However simple replication techniques would result in too much overhead because of the low availability of the nodes. Instead Wuala employs erasure coding and splits the data into small pieces. Optimal erasure codes produce n/r fragments where any n fragments is sufficient to recover the original message. These pieces are then distributed in the P2P network providing good availability at a reasonable overhead. The P2P network consists of client, storage and routing nodes. The Wuala architecture uses a mix of regular and random graphs to optimize routing. Dominik also explains how Wuala architecture is designed to provide security and fairness. Wuala employs the 128 bit AES algorithm for encryption and the 2048 bit RSA algorithm for authentication. If you're interested in how Wuala manages encryption, have a look at their publication on Cryptree. They have also implemented distributed reputation audit and maintenance functions. Check out the Tech Talk! It is worth the time!

Click to read more ...

Wednesday
May282008

Webinar: Designing and Implementing Scalable Applications with Memcached and MySQL

The following technical Webinar could be of interest to the community. WHO:

  • Farhan "Frank" Mashraqi, Director of Business Operations and Technical Strategy, Fotolog Inc
  • Monty Taylor, Senior Consultant, Sun Microsystems
  • Jimmy Guerrero, Sr Product Marketing Manager, Sun Microsystems - Database Group
WHAT:
  • Designing and Implementing Scalable Applications with Memcached and MySQL web presentation.
WHEN:
  • Thursday, May 29, 2008, 10:00 am PST, 1:00 pm EST, 18:00 GMT
  • The presentation will be approximately 45 minutes long followed by Q&A.
Check out the details here!

Click to read more ...

Wednesday
Apr232008

Behind The Scenes of Google Scalability

The recent Data-Intensive Computing Symposium brought together experts in system design, programming, parallel algorithms, data management, scientific applications, and information-based applications to better understand existing capabilities in the development and application of large-scale computing systems, and to explore future opportunities. Google Fellow Jeff Dean had a very interesting presentation on Handling Large Datasets at Google: Current Systems and Future Directions. He discussed: • Hardware infrastructure • Distributed systems infrastructure: –Scheduling system –GFS –BigTable –MapReduce • Challenges and Future Directions –Infrastructure that spans all datacenters –More automation It is really like a "How does Google work" presentation in ~60 slides? Check out the slides and the video!

Click to read more ...

Sunday
Mar092008

Best Practices for Speeding Up Your Web Site

The Exceptional Performance group at Yahoo! has identified 14 best practices for making web pages faster. These best practices have proven to reduce response times of Yahoo! properties by 25-50%. They focus on the front-end, for example, why it's bad to use "@import" for including stylesheets and why ETags disable browser caching. This google tech talk features these best practices and demonstrate YSlow. Relevant links: 14 Rules for Exceptional Web Performance: http://developer.yahoo.com/performance/rules.html YSlow: http://developer.yahoo.com/yslow/ Check out the book for details: High Performance Web Sites: Essential Knowledge for Front-End Engineers

Click to read more ...

Monday
Feb042008

Streaming Video on Amazon EC2?

An Amazon EC2 Flash Video Streaming solution has been announced by Wowza Media. What do you think about the future of similar solutions? Is Amazon EC2 and S3 ready for video streaming? I have found threads on their forums related to the performance, scalability and high availability of the hosted streaming solution. How would you make it scalable? Is it really cheaper than traditional hosting? Looking forward to your thoughts!

Click to read more ...

Tuesday
Jan152008

Sun to Acquire MySQL

So what are we announcing today? That in addition to acquiring MySQL, Sun will be unveiling new global support offerings into the MySQL marketplace. We'll be investing in both the community, and the marketplace - to accelerate the industry's phase change away from proprietary technology to the new world of open web platforms. Read more on Jonathan Schwartz's Blog What do you think about this?

Click to read more ...

Sunday
Jan132008

Google Reveals New MapReduce Stats

The Google Operating System blog has an interesting post on Google's scale based on an updated version of Google's paper about MapReduce. The input data for some of the MapReduce jobs run in September 2007 was 403,152 TB (terabytes), the average number of machines allocated for a MapReduce job was 394, while the average completion time was 6 minutes and a half. The paper mentions that Google's indexing system processes more than 20 TB of raw data. Niall Kennedy calculates that the average MapReduce job runs across a $1 million hardware infrastructure, assuming that Google still uses the same cluster configurations from 2004: two 2 GHz Intel Xeon processors with Hyper-Threading enabled, 4 GB of memory, two 160 GB IDE hard drives and a gigabit Ethernet link. Greg Linden notices that Google's infrastructure is an important competitive advantage. "Anyone at Google can process terabytes of data. And they can get their results back in about 10 minutes, so they can iterate on it and try something else if they didn't get what they wanted the first time." It is interesting to compare this to Amazon EC2:

  • $0.40 Large Instance price per hour x 400 instances x 10 minutes = $26.7
  • 1 TB data transfer in at $0.10 per GB = $100
For a hundred bucks you could also process a TB of data!

Click to read more ...

Page 1 ... 1 2 3 4