High Scalability -

Entries by Todd Hoff (380)

Tuesday

Dec162008

Facebook is Hiring

Tuesday, December 16, 2008 at 4:31AM

I thought with the job situation these days that people might be interested in some open jobs at Facebook. Here's what's available:

Facebook is hiring! We are looking for a Systems Engineer/Architect and Site Reliability Engineer. I have attached the job descriptions below. If you are interested, please contact Michelle Bostock mbostock-at-facebook.com. Thanks and Happy Holidays! Systems Architect Palo Alto, CA Description Facebook is seeking a seasoned Systems Architect to join the Operations team. The position is full-time and is based in our main office in downtown Palo Alto and will report to the Manager of Systems Operations. Responsibilities * Analyze application flow and infrastructure design to improve performance and scalability of the site * Collaborate on design of services infrastructure from servers to networking * Monitor, analyze, and make recommendations as appropriate to improve site stability and availability * Evaluate hardware and software technologies to improve site efficiency and performance * Troubleshoot and solve issues with hardware, applications, and network components * Lead team efforts from design to implementation, prioritize tasks and resources while interacting with Engineering and Operations * Document current and future configuration processes and policies * Participate in 24x7 on-call support Requirements * B.S. in Computer Science or equivalent experience * 4+ years of experience in Operations with large web farms * Extensive knowledge of web architecture and technologies, including Linux, Apache, MySQL, PHP, TCP/IP, security, HTTP, LDAP and MTAs * Strong background/interest in application and infrastructure design * Scripting and programming skills * Excellent verbal and written communication skills
Site Reliability Engineer Palo Alto, CA Description Facebook is seeking talented operations engineers to join the Site Reliability Engineering team. The ideal candidate will have strong communication skills, a passion for tinkering with Linux, and an almost insane fondness for fast-paced, seat-of-your-pants troubleshooting and crisis management. The position is full-time and is based in our main office in downtown Palo Alto. This position reports to the Manager of Site Reliability Engineering. Responsibilities * Monitor the stability and performance of the website * Remotely troubleshoot and diagnose hardware problems * Debug issues with Linux software, applications and network * Resolve technical challenges encountered in LAMP technologies * Develop and maintain monitoring tools and automation systems * Predict and respond to utilization variances across multiple datacenters * Identify and triage all outage related events * Facilitate communication, coordinate escalation, and work with subject matter experts to implement critical fixes * Automate and streamline processes * Track issues and run reports Requirements * 2-3 years+ Linux support/sys admin experience in an Internet operations environment * BA/BS in Computer Science or a related field, or equivalent experience * Working knowledge of Linux, Cisco, TCP/IP, Apache and mySQL * Experience working with network management systems and monitoring tools, such as Nagios, Ganglia and Cacti * Competency in Shell, PHP, Perl or Python. C is a plus * Solid understanding of web services architecture and commonly employed technologies * A sense of urgency in responding to and resolving critical issues that relate to the performance of the site and/or core infrastructure * Excellent verbal and written communication skills * Participation in a shifted coverage schedule, including working nights and on-call rotations

Click to read more ...

Todd Hoff |

1 Comment |

Permalink |

Print Article

Email Article

facebook,

jobs

Saturday

Dec132008

Strategy: Facebook Tweaks to Handle 6 Time as Many Memcached Requests

Saturday, December 13, 2008 at 5:19AM

Our latest strategy is taken from a great post by Paul Saab of Facebook, detailing how with changes Facebook has made to memcached they have:

...been able to scale memcached to handle 200,000 UDP requests per second with an average latency of 173 microseconds. The total throughput achieved is 300,000 UDP requests/s, but the latency at that request rate is too high to be useful in our system. This is an amazing increase from 50,000 UDP requests/s using the stock version of Linux and memcached.

To scale Facebook has hundreds of thousands of TCP connections open to their memcached processes. First, this is still amazing. It's not so long ago you could have never done this. Optimizing connection use was always a priority because the OS simply couldn't handle large numbers of connections or large numbers of threads or large numbers of CPUs. To get to this point is a big accomplishment. Still, at that scale there are problems that are often solved.

Some of the problem Facebook faced and fixed:

Per connection consumption of resources. What works well at low number of inputs can totally kill a system as inputs grow. Memcached uses a per-connection buffer which adds up to a lot of memory that could be used to store data. Nothing wrong with this design choice, but Facebook made changes to use a per-thread shared connection buffer and reclaimed gigabytes of RAM on each server.

Kernel lock contention. Facebook discovered under load there was lock contention when transmitting through a single UDP socket from multiple threads. Sockets are data structures too and they are subject to the usual lock contention issues. Facebook got around this issue by maintaining separate reply sockets in different threads so they would not contend with the receive sockets. They found another bottleneck in Linux’s “netdevice” layer that sits in-between IP and device drivers. They changed the dequeue algorithm to batch dequeues so more work was done when they had the CPU.

Application lock contention. Nothing brings out lock issues like moving to more cores. Facebook found when they moved to 8 core machines a global lock protecting stats collection used 20-30% of CPU usage. In application that require little processing per request, as does memcached, this is not unexpected, but doing real work with your CPU is a better idea. So they collected stats on a per thread basis and then calculated a global view on demand.

Interrupt floods and starvation. With so much traffic directed at a single server the hardware can flood the CPU(s) with interrupts and keep the CPU from doing "real" work. To get around this problem Facebook implements some complicated strategies to load balance IO across all the cores. As I am less clever I might try more network cards with a TCP Offload engine.

When you read Paul's article keep in mind all the incredible number of man hours that went into profiling the system, not just their application, but the entire software hardware stack. Then add in the research, planning, and trying different solutions to see if anything changed for the better. It's a lot of work. Notice using a nifty new parallel language or moving to a cloud wouldn't have made a bit difference. It's complete mastery of their system that made the difference.

A summary of potential strategies:

Profile everything. Problems are always specific. The understanding of the problem must be specific. The fix must be specific.

Burn profiling into your regression tests. Detect when and where performance tanks as a regular part of your build.

Use resources in proportion to what grows slowest. This requires multiplexing, but at least your resource usage is more predictable and bounded.

Batch work. When you have the CPU do all the work you possibly can in the quantum or the whole system grinds to a halt in processing overhead.

Do work and maintain resources per task. Otherwise locking for shared resources takes more and more time when there's less and less time to do the work that needs to be done.

Change algorithms. Sometimes you simply need to do things differently. Tweaking will only get you so far.

You can find their changes on github, the hub that says "git."

Todd Hoff |

3 Comments |

Permalink |

Print Article

Email Article

Memcached,

Strategy,

facebook

Saturday

Dec062008

Paper: Real-world Concurrency

Saturday, December 6, 2008 at 2:27AM

An excellent article by Bryan Cantrill and Jeff Bonwick on how to write multi-threaded code. With more processors and no magic bullet solution for how to use them, knowing how to write multiprocessor code that doesn't screw up your system is still a valuable skill. Some topics:

Know your cold paths from your hot paths.

Intuition is frequently wrong—be data intensive.

Know when—and when not—to break up a lock.

Be wary of readers/writer locks.

Consider per-CPU locking.

Know when to broadcast—and when to signal.

Learn to debug postmortem.

Design your systems to be composable.

Don't use a semaphore where a mutex would suffice.

Consider memory retiring to implement per-chain hash-table locks.

Be aware of false sharing.

Consider using nonblocking synchronization routines to monitor contention.

When reacquiring locks, consider using generation counts to detect state change.

Use wait- and lock-free structures only if you absolutely must.

Prepare for the thrill of victory—and the agony of defeat. While I don't agree that code using locks can be made composable, this articles covers a lot of very useful nitty-gritty details that will up your expert rating a couple points.

Click to read more ...

Todd Hoff |

5 Comments |

Permalink |

Print Article

Email Article

Paper,

concurrency

Friday

Dec052008

Sprinkle - Provisioning Tool to Build Remote Servers

Friday, December 5, 2008 at 1:54AM

At 37 Signals Joshua Sierles describes how 37 Signals uses Sprinkle to configure their servers within EC2. Sprinkle defines a domain specific meta-language for describing and processing the installation of software. You can find an interesting discussion of Sprinkle's creation story by the creator himself, Marcus Crafter, in Sprinkle Some Powder!. Marcus divides provisioning tools into two categories:

Task Based - the tool issues a list of commands to run on the remote system, either remotely via a network connection or smart client.

Policy/state Based - the tool determines what needs to be run on the remote system by examining its current and final state. Sprinkle combines both models together in a chocolate-in-my-peanut-butter approach using normal Ruby code as the DSL (domain specific language) to declaratively describe remote system configurations. 37 Signals likes the use of Ruby as the DSL because it makes learning a separate syntax unnecessary. I've successfully done similar things in Perl. You already have a scripting language, why layer another one on top? One reason not to is that you've now tied configuration and execution together so that only one tool can control the process, but the leverage is so high with this approach it's hard to ignore. There's all the usual bits about defining packages, dependencies, installation logic, pre and post actions, etc. The format is compact and clear because that's how Ruby is and the operations are task specific so there's no fluff. Capistrano is used to communicate with remote systems though that is pluggable. 37 Signals uses the EC2 security group as way to specify the role an instance should take on when it boots. A configuration script that can handle all roles is shipped with a near complete functional base image. Sprinkle then configures the system the rest of the way based on the passed in role. Joshua says they like this approach better than Puppet because it doesn't rely on a centralized configuration server or "pushing large sets of commands over SSH manually." There's always one more than one way to do "it" and Sprinkle carves out an interesting niche in the provisioning space. The 37 Signal's approach doesn't scale to a large organization with many different flavor of servers, but for a specific set of tightly cooperating servers it's a very simple, clean, and robust way of doing business. Related Articles: Product: Puppet the Automated Administration System.

Click to read more ...

Todd Hoff |

1 Comment |

Permalink |

ruby

Wednesday

Dec032008

Java World Interview on Scalability and Other Java Scalability Secrets

Wednesday, December 3, 2008 at 7:08AM

OK, this interview is with me on Java scalability issues. I sound like a bigger idiot than I would like, but I suppose it could have been worse. The Java World folks were very nice and did a good job, so there’s no blame on them :-) The interview went an interesting direction, but there’s more I’d like add and I will do so here. Two major rules regarding Java and scalability have popped out at me:

Java – It’s the platform stupid. Java the language isn’t the big win. What is the big win is the ecosystem building up around the JVM, libraries, and toolsets.

Java – It’s the community stupid. A lot of creativity is being expended on leveraging the Java platform to meet scalability challenges. The amazing community that has built up around Java is pushing Java to the next level in almost every direction imaginable. The fecundity of the Java ecosystem can most readily be seen with the efforts to tame our multi-core future. There’s a multi-core crisis going in case you haven’t heard. It’s all you’ll hear mentioned in the halls of the Pentagon. The CPU wizards have maxed out on clock speed and the only way we can scale is by adding more cores. And we don't know how to do that. At 100 cores common ways of doing things break down. Locks don't scale. Cache contention for shared memory slows us down. Bandwidth on the bus is limited. TLB for managing more memory is in short supply. And we need more high speed network cards to handle faster CPUs. And that’s just at the hardware level. It’s worse for programmers. Locking is just a nightmare. I didn't believe that at first. Early in my career I worked on several multi-core systems. I thought everything was cool. Be careful and it will all work out. But work with a group and it all goes to hell. People add functions, locks, take too much time. Problems like deadlock, priority inversion, and high latency all kill a system. What can we do? As we’ll see, Java and more importantly the JVM have become a platform for many interesting technologies and scalability patterns. Let’s take a look at few.

How Java affects both Performance and Scalability

Peter Williams in this very informative blog post discusses how Java affects both performance and scalability. The main points are:

Java does not scale any better, or worse, than any other general purpose language.

How easily can you increase the number of requests/sec the system can handle derives from the architecture of the system (sharding, caching, avoid database etc).

Java’s culture encourages practices that scale only to medium size system because it encourages techniques that do not scale well : * favors multi-threading * shared state * vertical scaling * large monolithic components * multiple tiers, lots of layers

Java is for performance and is the reason to use Java even though the good performance means it scales pretty well out of the box.

Scaling with Java requires going your own way and bucking the culture to implement more scalable practices. Alain Penders makes some good points in the comments:

Java scalable not because it’s better suitable for building scalable systems, but because how to build scalable systems with it is well understood and the components you need to do so are widely available. Not only are those components widely available, they are available from multiple vendors — some free, some commercial — so you’re never locked in, an aspect that’s extremely valuable when producing commercial software.

Ruby & Rails applications can be built to scale well, but the techniques are not as well understood as for Java. Ron takes an apposing view and says Java is less scalable:

During garbage collection all threads are blocked and the garbage collection time can expand to minutes. These huge latencies effectively limit memory which limits scalability.

Increased garbage collection latencies make Java less useful for application that use heart beats, make real-time trades, etc.

Real Time Java API extensions were developed as a way of addressing garbage collection problems. Greg Frank weighs in with some excellent points:

Java language itself has nothing to do with scalability.

Ron’s comments about garbage collection don’t apply to post 1.4 JVMS.

Old-school J2EE is dead.

The monolithic server is being replaced with advanced patterns based on new technologies like jgroups, ehcache and terracotta.

The true force pushing scalability is the java community itself. Java is merely a convenient and commonly spoken language that allows for the formation of a global community.

We should stop debating language internals and start debating sophisticated design patterns that promote scalability.

The Top 10 Ways to Botch Enterprise Java Application Scalability and Reliability

This is a wonderful presentation by Oracle’s Cameron Purdy. Here’s a PDF. Cameron was CEO of Tangosol before Oracle bought them out. Tangosol made Coherence, a distributed cache. Cameron is a long time prolific contributor to the Java community. His presentation is a must see. He’s both entertaining and technically excellent. The main points he makes in the presentation are:

1. Avoid proprietary features/Believe product claims.

2. Assume the network works

3. Use big JVM machine heaps

4. Use a one-size-fits-all architecture

5. Assume disaster-recovery can be added when it becomes necessary

6. Abuse Abstractions/Avoid abstractions

7. Introduce a single point of bottleneck/Introduce a single point of failure

8. Abuse the database/Avoid the database

9. Assume you are smarter than the infrastructure/Follow the rules blindly

10. Optimize performance assuming that it will translate to scalability/Ignore the potential impact of performance on scalability (and vice-versa). Remember, iff these seem backwards remember these are botching strategies. You'll want to see the presentation to see each point fleshed out in more detail.

Azul

Azul is a Java Compute Appliance and is the ultimate scale-up play for Java. It kind of does what Google App Engine does at the framework level but does it to the JVM at the hardware level. Current standard practice is to deploy Java application across a cluster of commodity servers. Azul does the opposite. It goes big. The most recent release can contain up to 864 processor cores and 768 GB of memory. That’s big. Azul transparently runs unmodified Java applications on their specialized hardware platform which allows even the most mild mannered of Java apps to scale. Their hardware-assisted garbage collector dramatically reduces application pauses and gives access to hundred of gigabytes of RAM. Some very impressive performance improvements are possible. In one case study Breakthrough Scalability of an Application Constrained to an x86 Server, an application was given access to 384 cores and 128 GB of memory on an Azul compute appliance. The result was a 45x improvements in scalability. Scalability was increased along a number of dimensions (quoted from their article):

Thread count: Initially the application was artificially limiting the number of threads that were allowed to execute. With few threads, the application started at 2,200 OPS on Azul (below the 14,000 native score) but by simply increasing the thread pool size and expanding the heap throughput jumped to 60,000 OPS.

Memory locks: When many threads access the same piece of memory, traditional systems force developers to serialize the threads to make sure two threads do not simultaneously change the same memory location (similar to how airlines make sure the same seat is not assigned to two different people.) Azul appliances can detect such collisions in real time and assure correct execution. The removal of two such "hot locks" and a further increase in the number of threads and heap size achieved a new peak of 115,000 OPS.

Logging: Slow applications can afford to maintain logs of unneeded events or even multiple copies of the same information. As throughput increases, the shear volume of such logs make them difficult to analyze and it becomes more practical to log only what is needed. Reducing the amount of logging and addressing a related single threaded bottleneck raised the peak to 350,000 OPS.

More locks: With the above steps taken, a new lock was exposed as a barrier to scalability. Once that was resolved the compute appliance was able to deliver 630,000 OPS – a 45x improvement over the original native performance! If Azul is so cool why aren’t all applications being run on Azul? Buying a completely proprietary hardware platform is too big a risk without a gigantic throbbing pain point. I would like to see Azul open their own cloud so we could get in with low cost and risk.

X10

X10 is not just an inexpensive home automation control system. X10 is also:

A type-safe, modern, parallel, distributed object-oriented language intended to be very easily accessible to Java(TM) programmers. It is targeted to future low-end and high-end systems with nodes that are built out of multi-core SMP chips with non-uniform memory hierarchies, and interconnected in scalable cluster configurations. A member of the Partitioned Global Address Space (PGAS) family of languages, X10 highlights the explicit reification of locality in the form of places; lightweight activities embodied in async, future, foreach, and ateach constructs; constructs for termination detection (finish) and phased computation (clocks); the use of lock-free synchronization (atomic blocks); and the manipulation of global arrays and data structures. An Eclipse-based Integrated Development Environment (IDE) has been developed at IBM for X10 to help further increase programmer productivity by providing state-of-the-art functionality for viewing, editing, navigating, executing, and manipulating X10 programs.

X10 is built on top of Java. X10 adds:

value types, nullable

Array language * Multi-dimensional arrays, * aggregate operations

New concurrency features * activities (async, future), atomic * blocks, clocks

Distribution * places * distributed arrays X10 does not have:

Dynamic class loading

Java’s concurrency features

thread library, volatile,synchronized, wait, notify X10 restricts:

Class variables and static initialization The result is hopefully a language that can be scaled across a cluster of mulit-core processors yet still has the familiar Java syntax and is developed using familiar Java development tools like Eclipse.

Clojure, Jruby, Jpython

More of the "it’s the platform stupid." We have many different and interesting languages being built on the JVM platform.

Jruby

Jruby is an 100% pure-Java implementation of the Ruby programming language.

Jpython

Jpython is an 100% pure-Java implementation of the Python programming language.

Clojure

I first came upon Clojure researching software transactional memory (STM) as solution to the problem of how to create easy to write massively parallel programs. STM is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It functions as an alternative to lock-based synchronization. It’s supposed to make writing parallel programs easier. The idea is you can do away with all those nasty locks that cause so many problems. Some have found STM’s performance very disappointing for larger scale applications. And it may ultimately fail simply because everything that inside a transaction boundary is not memory. Programs routinely call out to other services and peripherals. How can STM work in real world environments? STM may not turn out to be the savior of the multi-core world, but Clojure explores some very new Java territory:

Clojure is a dynamic programming language that targets the Java Virtual Machine. It is designed to be a general-purpose language, combining the approachability and interactive development of a scripting language with an efficient and robust infrastructure for multithreaded programming. Clojure is a compiled language - it compiles directly to JVM bytecode, yet remains completely dynamic. Every feature supported by Clojure is supported at runtime. Clojure provides easy access to the Java frameworks, with optional type hints and type inference, to ensure that calls to Java can avoid reflection. Clojure is a dialect of Lisp, and shares with Lisp the code-as-data philosophy and a powerful macro system. Clojure is predominantly a functional programming language, and features a rich set of immutable, persistent data structures. When mutable state is needed, Clojure offers a software transactional memory system and reactive Agent system that ensure clean, correct, multithreaded designs.

Clojure doesn’t fit my aging mental model. The message-passing actor model of Erlang is more my style. Interestingly the difference between Erlang and Clojure is quite purposeful. Clojure wants to be efficient while operating in the same process rather than taking a message passing hit for every operation. Clojure requires specifying an agent as the receiver of a message where I prefer a more publish-subscribe approach where message senders and consumers are independent. Clojure's use of Java threads makes latency difficult to control. And I'm not sure a S-expression based language can ever become popular. But these are relatively minor issues compared to the task of making Java safe for parallelism. Java does OOP well enough, but sucks at concurrency. Clojure is a nice middle-ground that may be able to make concurrency-oriented programming by real humans in Java a reality.

GigaSpaces, GridGain, Terracotta

GigaSpaces, GirdGain, and Terracotta take Java objects and fairly transparently turn them into highly distributed in-memory grids with amazing out-of-the-box functionality. If RAM is the new disk these products make it very easy to hop on the next generation architecture. Again, their ability to do so much behind the scenes is based on the power of the JVM. Try doing any of this with C++. Can’t happen. Only in the Java world will you see so much cutting edge innovation.

Open Services Gateway Initiative (OSGi)

One of the major problems with using Java is it’s a pain to deploy a new release across many machines in a data center. Deploying patches and upgrades requires bringing containers down. There’s also isn’t a good way to architect WAR files. Creating one WAR file with multiple services doesn’t work for developers. Creating N WAR files with one service doesn’t scale for containers. And how do you run multiple versions of the same service in the same container? OSGI is a solution that should make dynamic and high availability deployment of Java web services a reality. OSGi is a dynamic module system for Java. Class loading done right. OSGi defines an architecture for developing and deploying modular applications and libraries by creating a microkernel-style architecture. There’s a core set of modules that make up a basic platform and new functionality is dynamically layered in with a plugin. Using OSGi these plugins are isolated, secured and controlled from the rest of code. The unit of deployment is an OSGi bundle, which is simply a JAR file with an OSGi manifest. This approach allows loosely-coupled application modules to be developed by a team of developers. Everything is kept in-sync using version numbers and module dependency ranges. If you’ve ever worked with Linux this should sound familiar. It’s basically how packages are installed on Linux.

Many Companies are Successfully Using Java

Many companies are using Java on their websites, they just don't use the full stack. Java is the ultimate service implementation language. There’s a trend like Amazon to develop in terms of separate services that are composed together to produce pages. Put up a cluster of applications, load balance between them and you are set. This is a big move for internal architecture. Web services now have external APIs. Those same APIs can be used internally to build your site. Java is great for larger web sites who need to start thinking in terms of services.

Fotolog . Fotolog is the poster boy for java scalability. They migrated from PHP to a new, Java-based architecture that, in addition to giving greater flexibility and reuse for future code, allows for a faster response time while halving the number of servers.

Amazon. Amazon uses a serviced based architecture. They are not stuck with one particular approach. Some places they use jboss/java, but they use only servlets, not the rest of the J2EE stack.

eBay. Use what you like and toss what you don't need. Ebay didn't feel compelled to use full blown J2EE stack. They liked Java and Servlets so that's all they used. You don't have to buy into any framework completely. Just use what works for you.

Mailinator. Handles over 1.2 billion emails a year on one rickity old server. The web application, the email server, and all email storage run in one JVM. Java doesn't have to be slow.

Tailrank. They use use Java, MySQL and Linux for our cluster. Java is a great language for writing crawlers. The library support is pretty solid (though it seems like Java 7 is going to be killer when they add closures).

Flickr. They use Java for their node service and as a FTP daemon and for several other services.

Linkedin. LinkedIn’s architecture has evolved to scale up to 22 million users. LinkedIn is 99% Pure Java. They use a service oriented architecture, java, jetty, eh-cache, and spring. Clients post messages via asynchronous Java Communications API using JMS. The WebApp doesn’t do everything itself anymore: they split parts of its business logic into Services. Each Service has its own domain-specific database (i.e., vertical partitioning).

Amazon’s Dynamo. In Dynamo, each storage node has three main software components: request coordination, membership and failuredetection, and a local persistence engine. All these components are implemented in Java.

GoogleTalk. Google’s IM system implemented in Java.

FeedBurner. A news feed management system written in Java. We’ve covered a lot of ground. We’ve seen how the excesses of old J2EE scalability failures can be routed around with ease using a number of different scalability patterns. We’ve seen some really innovative and amazing products available to Java developers. And we’ve seen a lot of successful websites use Java. Ah, after all that hard work it’s time for another cup of java. Peet’s coffee is my favorite.

Beautiful concurrency by Simon Peyton Jones, Microsoft Research, Cambridge.

Concurrency Shysters by Bryan Cantrill.

Transactions are tomorrow's loads and stores by Nir Shavit .

Software transactional memory: why is it only a research toy? by too many people to list.

Real-world Concurrency by Bryan Cantrill and Jeff Bonwick.

Cantrill and Bonwick get all concurrent-y up in there... by Keith Adams

Fortress - Fortress is a new programming language designed for high-performance computing (HPC) with high programmability.

Azule’s Cliff Click Jr.’s Blog. The folks at Azul have some very interesting blogs. Check them out.

We Don't Know How To Program by Cliff Click Jr.

JavaOne: Cameron Purdy & ‘The Top 10 Ways to Botch Enterprise Java Scalability and Reliability by Ben Teese

Why most large-scale Web sites are not written in Java by Nati Shalom of GigaSpaces. There are also a number very good blogs written by the GigaSpaces folks.

Raising Web Service Updates Efficiency with Dynamic Technologies by Valery Abu-Eid

Building LinkedIn's Next Generation Architecture with OSGi by Yan Pujante

Clojure could be to Concurrency-Oriented Programming what Java was to OOP by Bill Clementson.

Click to read more ...

Todd Hoff |

3 Comments |

Permalink |

Print Article

Email Article

Java

Monday

Nov242008

Product: Scribe - Facebook's Scalable Logging System

Monday, November 24, 2008 at 12:25AM

In Log Everything All the Time I advocate applications shouldn't bother logging at all. Why waste all that time and code? No, wait, that's not right. I preach logging everything all the time. Doh. Facebook obviously feels similarly which is why they opened sourced Scribe, their internal logging system, capable of logging 10s of billions of messages per day. These messages include access logs, performance statistics, actions that went to News Feed, and many others.

Imagine hundreds of thousands of machines across many geographical dispersed datacenters just aching to send their precious log payload to the central repository off all knowledge. Because really, when you combine all the meta data with all the events you pretty much have a complete picture of your operations. Once in the central repository logs can be scanned, indexed, summarized, aggregated, refactored, diced, data cubed, and mined for every scrap of potentially useful information.

Just imagine the log stream from all of Facebook's Apache servers alone. Brutal. My guess is these are not real-time feeds so there are no streaming query issues, but the task is still daunting. Let's say they log 10 billion messages a day. That's over 1 million messages per second!

When no off the shelf products worked for them they built their own. Scribe can be downloaded from Sourceforge. But the real action is on their wiki. It's here you'll find some decent documentation and their support forums. Not much activity on the site so you haven't missed your chance to be a charter member of the Scribe guild.

A logging system has three broad components:

Client Code Interface - How does your code interact with the log system? Scribe doesn't do much for you here. There's a simple Thrift interface for logging from a large set of languages, but the bulk of the work is stull up to you.

Distribution System - This is were Scribe fits. It reliably (mostly) moves large numbers of messages around. A few error cases lead to data loss: 1) If a client can't connect to either the local or central scribe server the message will be loss; 2) If a scribe server crashes it could lose a small amount of data that's in memory but not on disk; 3) Some multiple component failure cases, such as a resender can't connect to any central server and its local disk fills up; 4) Some rare timeout conditions can lead to duplicate messages

Do Something Usefullizer - How do you do anything useful with over 1 million messages per second? Good question. Scribe doesn't help here. But Scribe will get your data their.

I browsed around the source and it's a well crafted, straightforward socket server that forwards messages to other servers and can write messages to disk. Nothing fancy which is why it probably works for them. It's basic function is:

Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. There is a scribe server running on every node in the system, configured to aggregate messages and send them to a central scribe server (or servers) in larger groups. If the central scribe server isn't available the local scribe server writes the messages to a file on local disk and sends them when the central server recovers. The central scribe server(s) can write the messages to the files that are their final destination, typically on an nfs filer or a distributed file system, or send them to another layer of scribe servers.

It some ways it could be fancier. For example, there's no throttle on incoming connections so a server can chew up memory. And there is a max_msg_per_second throttle on message processing, but this is really to simple. Throttling needs to be adaptive based on local conditions and the conditions of down stream servers. Under load you want to push flow control back to the client so the data stays there until resources become available. Simple configuration file settings rarely work when the world starts getting weird.

Client Code Interface

Here's what the Thrift interface looks like:

enum ResultCode
{
OK,
TRY_LATER
}

struct LogEntry
{
1: string category,
2: string message
}

service scribe extends fb303.FacebookService
{
ResultCode Log(1: list messages);
}

I know, I thought the same thing. Thank God there's another IDL syntax. We simply did not have enough of them. Thrift translates this IDL into the glue code necessary for making cross-language calls (marshalling arguments and responses over the wire). The Thrift library also has templates for servers and clients.

Here's what a call looks like in PHP:

$messages = array();
$entry = new LogEntry;
$entry->category = "buckettest";
$entry->message = "something very interesting happened";
$messages []= $entry;
$result = $conn->Log($messages);

Pretty simple. Usually in C++, for example, there's an elaborate set of macros for logging that provide sophisticated control of log generation. It might look something like:

MSG(msg) - a simple message. It only prints out msg. None of the other information is printed out.
NOTE(const char* name, const char* reason, const char* what, Module* module, msg) - something to take note of.
WARN(const char* name, const char* reason, const char* what, Module* module, msg) - a warning.
ERR(const char* name, const char* reason, const char* what, Module* module, msg) - an error occured.
CRIT(const char* name, const char* reason, const char* what, Module* module, msg) - a critical error occurred.
EMERG(const char* name, const char* reason, const char* what, Module* module, msg) - an emergency occurred.

There's lots more to handle streams and behind the scenes things like time stamps, thread ids, function names, and line numbers. Scribe has wisely not done any of that. It has a RPC like interface to send a list of messages and that's it. It's up to you to write the wrappers.

You'll no doubt have noticed Scribe only logs a category and message, both strings:

Scribe is unique in that clients log entries consisting of two strings, a category and a message. The category is a high level description of the intended destination of the message and can have a specific configuration in the scribe server, which allows data stores to be moved by changing the scribe configuration instead of client code. The server also allows for configurations based on category prefix, and a default configuration that can insert the category name in the file path. Flexibility and extensibility is provided through the "store" abstraction. Stores are loaded dynamically based on a configuration file, and can be changed at runtime without stopping the server. Stores are implemented as a class hierarchy, and stores can contain other stores. This allows a user to chain features together in different orders and combinations by changing only the configuration.

Distribution System

The payload has whatever structure you give it. Scribe is policy neutral and doesn't push a logging model on you.

The configuration file looks something like this:


# BUCKETIZER TEST
<store>
  category=buckettest
  type=buffer
  target_write_size=20480
  max_write_interval=1
  buffer_send_rate=2
  retry_interval=30
  retry_interval_range=10
<primary>
   type=bucket
   num_buckets=6
   bucket_subdir=bucket
   bucket_type=key_hash
   delimiter=1
<bucket>
   type=file
   fs_type=std
   file_path=/tmp/scribetest
   base_filename=buckettest
   max_size=1000000
   rotate_period=hourly
    rotate_hour=0
   rotate_minute=30
   write_meta=yes
</bucket>
</primary>
<secondary>
  type=file
  fs_type=std
  file_path=/tmp
  base_filename=buckettest
  max_size=30000
</secondary>
</store>

The types of stores currently available are:

file - writes to a file, either local or nfs.

network - sends messages to another scribe server.

buffer - contains a primary and a secondary store. Messages are sent to the primary store if possible, and otherwise the secondary. When the primary store becomes available the messages are read from the secondary store and sent to the primary.

bucket - contains a large number of other stores, and decides which messages to send to which stores based on a hash.

null - discards all messages.

thriftfile - similar to a file store but writes messages into a Thrift TFileTransport file.

multi - a store that forwards messages to multiple stores.

Certainly a flexible and useful set of logging capabilities. You can build a hierarchy of log servers to do pretty much anything you want. You could imagine have a log server on each server that has file store to handle upstream server failures. This log server forwards messages onto a centralized server for a datacenter. And all the datacenter servers forward their logs on to the centralized data warehouse. To scale adjust fan-in and fan-out as necessary.

Do Something Usefullizer

You may not have over 1 million log messages a second to process, but you are likely to have your own tanker trunk full of log messages. How do you do something useful with them?

Log messages stored in log files are next to useless. Grep'ing on a terabyte of logs to answer simple questions about your data just doesn't work.

You may have a sharded datawarehouse you can pump log messages into and do reasonably effective job of querying.

Or you can set up a HADOOP/HDFS. style system. The idea here is you need a distributed file system to handle the continual stream of log messages. And once you have all the data stored safely away you'll need to use map-reduce to do anything with such a large amount of data.

If you want to ask, for example, how many of your users are from Asia, log files won't work. It's likely your data warehouse can't handle it. HADOOP/HDFS is a practical option.

If that's the direction you are going what does it imply about your log system? I would say it makes even the simple category-payload system of Scribe overkill. The with a scalable backend is to move log payloads from applications to the centralized store as quickly as possible. By definition the central store can handle the load, so there's no reason to use intermediate servers to scale. From an application write directly to the central store, even from multiple datacenters. The payload structure is unimportant until it hits the central store. If the application can't hit the central store then it queues into the file system until it can. Ideally log messages never hit the file system until HDFS is writing them to their final destination. This makes for a low latency and high throughput logging and is even simpler than Scribe.

If you don't have a scalable central store then Scribe is a good option. It gives you all the flexibility you need to compose your logging system in a way that is mostly reliabile and scalable.

Todd Hoff |

13 Comments |

Permalink |

operations

Saturday

Nov222008

Google Architecture

Saturday, November 22, 2008 at 10:01AM

Update 2: Sorting 1 PB with MapReduce. PB is not peanut-butter-and-jelly misspelled. It's 1 petabyte or 1000 terabytes or 1,000,000 gigabytes. It took six hours and two minutes to sort 1PB (10 trillion 100-byte records) on 4,000 computers and the results were replicated thrice on 48,000 disks. Update: Greg Linden points to a new Google article MapReduce: simplified data processing on large clusters. Some interesting stats: 100k MapReduce jobs are executed each day; more than 20 petabytes of data are processed per day; more than 10k MapReduce programs have been implemented; machines are dual processor with gigabit ethernet and 4-8 GB of memory. Google is the King of scalability. Everyone knows Google for their large, sophisticated, and fast searching, but they don't just shine in search. Their platform approach to building scalable applications allows them to roll out internet scale applications at an alarmingly high competition crushing rate. Their goal is always to build a higher performing higher scaling infrastructure to support their products. How do they do that?

Information Sources

Video: Building Large Systems at Google

Google Lab: The Google File System

Google Lab: MapReduce: Simplified Data Processing on Large Clusters

Google Lab: BigTable.

Video: BigTable: A Distributed Structured Storage System.

Google Lab: The Chubby Lock Service for Loosely-Coupled Distributed Systems.

How Google Works by David Carr in Baseline Magazine.

Google Lab: Interpreting the Data: Parallel Analysis with Sawzall.

Dare Obasonjo's Notes on the scalability conference.

Platform

Linux

A large diversity of languages: Python, Java, C++

What's Inside?

The Stats

Estimated 450,000 low-cost commodity servers in 2006

In 2005 Google indexed 8 billion web pages. By now, who knows?

Currently there over 200 GFS clusters at Google. A cluster can have 1000 or even 5000 machines. Pools of tens of thousands of machines retrieve data from GFS clusters that run as large as 5 petabytes of storage. Aggregate read/write throughput can be as high as 40 gigabytes/second across the cluster.

Currently there are 6000 MapReduce applications at Google and hundreds of new applications are being written each month.

BigTable scales to store billions of URLs, hundreds of terabytes of satellite imagery, and preferences for hundreds of millions of users.

The Stack

Google visualizes their infrastructure as a three layer stack:

Products: search, advertising, email, maps, video, chat, blogger

Distributed Systems Infrastructure: GFS, MapReduce, and BigTable.

Computing Platforms: a bunch of machines in a bunch of different data centers

Make sure easy for folks in the company to deploy at a low cost.

Look at price performance data on a per application basis. Spend more money on hardware to not lose log data, but spend less on other types of data. Having said that, they don't lose data.

Reliable Storage Mechanism with GFS (Google File System)

Reliable scalable storage is a core need of any application. GFS is their core storage platform.

Google File System - large distributed log structured file system in which they throw in a lot of data.

Why build it instead of using something off the shelf? Because they control everything and it's the platform that distinguishes them from everyone else. They required: - high reliability across data centers - scalability to thousands of network nodes - huge read/write bandwidth requirements - support for large blocks of data which are gigabytes in size. - efficient distribution of operations across nodes to reduce bottlenecks

System has master and chunk servers. - Master servers keep metadata on the various data files. Data are stored in the file system in 64MB chunks. Clients talk to the master servers to perform metadata operations on files and to locate the chunk server that contains the needed they need on disk. - Chunk servers store the actual data on disk. Each chunk is replicated across three different chunk servers to create redundancy in case of server crashes. Once directed by a master server, a client application retrieves files directly from chunk servers.

A new application coming on line can use an existing GFS cluster or they can make your own. It would be interesting to understand the provisioning process they use across their data centers.

Key is enough infrastructure to make sure people have choices for their application. GFS can be tuned to fit individual application needs.

Do Something With the Data Using MapReduce

Now that you have a good storage system, how do you do anything with so much data? Let's say you have many TBs of data stored across a 1000 machines. Databases don't scale or cost effectively scale to those levels. That's where MapReduce comes in.

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.

Why use MapReduce? - Nice way to partition tasks across lots of machines. - Handle machine failure. - Works across different application types, like search and ads. Almost every application has map reduce type operations. You can precompute useful data, find word counts, sort TBs of data, etc. - Computation can automatically move closer to the IO source.

The MapReduce system has three different types of servers. - The Master server assigns user tasks to map and reduce servers. It also tracks the state of the tasks. - The Map servers accept user input and performs map operations on them. The results are written to intermediate files - The Reduce servers accepts intermediate files produced by map servers and performs reduce operation on them.

For example, you want to count the number of words in all web pages. You would feed all the pages stored on GFS into MapReduce. This would all be happening on 1000s of machines simultaneously and all the coordination, job scheduling, failure handling, and data transport would be done automatically. - The steps look like: GFS -> Map -> Shuffle -> Reduction -> Store Results back into GFS. - In MapReduce a map maps one view of data to another, producing a key value pair, which in our example is word and count. - Shuffling aggregates key types. - The reductions sums up all the key value pairs and produces the final answer.

The Google indexing pipeline has about 20 different map reductions. A pipeline looks at data with a whole bunch of records and aggregating keys. A second map-reduce comes a long, takes that result and does something else. And so on.

Programs can be very small. As little as 20 to 50 lines of code.

One problem is stragglers. A straggler is a computation that is going slower than others which holds up everyone. Stragglers may happen because of slow IO (say a bad controller) or from a temporary CPU spike. The solution is to run multiple of the same computations and when one is done kill all the rest.

Data transferred between map and reduce servers is compressed. The idea is that because servers aren't CPU bound it makes sense to spend on data compression and decompression in order to save on bandwidth and I/O.

Storing Structured Data in BigTable

BigTable is a large scale, fault tolerant, self managing system that includes terabytes of memory and petabytes of storage. It can handle millions of reads/writes per second.

BigTable is a distributed hash mechanism built on top of GFS. It is not a relational database. It doesn't support joins or SQL type queries.

It provides lookup mechanism to access structured data by key. GFS stores opaque data and many applications needs has data with structure.

Commercial databases simply don't scale to this level and they don't work across 1000s machines.

By controlling their own low level storage system Google gets more control and leverage to improve their system. For example, if they want features that make cross data center operations easier, they can build it in.

Machines can be added and deleted while the system is running and the whole system just works.

Each data item is stored in a cell which can be accessed using a row key, column key, or timestamp.

Each row is stored in one or more tablets. A tablet is a sequence of 64KB blocks in a data format called SSTable.

BigTable has three different types of servers: - The Master servers assign tablets to tablet servers. They track where tablets are located and redistributes tasks as needed. - The Tablet servers process read/write requests for tablets. They split tablets when they exceed size limits (usually 100MB - 200MB). When a tablet server fails, then a 100 tablet servers each pickup 1 new tablet and the system recovers. - The Lock servers form a distributed lock service. Operations like opening a tablet for writing, Master aribtration, and access control checking require mutual exclusion.

A locality group can be used to physically store related bits of data together for better locality of reference.

Tablets are cached in RAM as much as possible.

Hardware

When you have a lot of machines how do you build them to be cost efficient and use power efficiently?

Use ultra cheap commodity hardware and built software on top to handle their death.

A 1,000-fold computer power increase can be had for a 33 times lower cost if you you use a failure-prone infrastructure rather than an infrastructure built on highly reliable components. You must build reliability on top of unreliability for this strategy to work.

Linux, in-house rack design, PC class mother boards, low end storage.

Price per wattage on performance basis isn't getting better. Have huge power and cooling issues.

Use a mix of collocation and their own data centers.

Misc

Push changes out quickly rather than wait for QA.

Libraries are the predominant way of building programs.

Some are applications are provided as services, like crawling.

An infrastructure handles versioning of applications so they can be release without a fear of breaking things.

Future Directions for Google

Support geo-distributed clusters.

Create a single global namespace for all data. Currently data is segregated by cluster.

More and better automated migration of data and computation.

Solve consistency issues that happen when you couple wide area replication with network partitioning (e.g. keeping services up even if a cluster goes offline for maintenance or due to some sort of outage).

Lessons Learned

Infrastructure can be a competitive advantage. It certainly is for Google. They can roll out new internet services faster, cheaper, and at scale at few others can compete with. Many companies take a completely different approach. Many companies treat infrastructure as an expense. Each group will use completely different technologies and their will be little planning and commonality of how to build systems. Google thinks of themselves as a systems engineering company, which is a very refreshing way to look at building software.

Spanning multiple data centers is still an unsolved problem. Most websites are in one and at most two data centers. How to fully distribute a website across a set of data centers is, shall we say, tricky.

Take a look at Hadoop (product) if you don't have the time to rebuild all this infrastructure from scratch yourself. Hadoop is an open source implementation of many of the same ideas presented here.

An under appreciated advantage of a platform approach is junior developers can quickly and confidently create robust applications on top of the platform. If every project needs to create the same distributed infrastructure wheel you'll run into difficulty because the people who know how to do this are relatively rare.

Synergy isn't always crap. By making all parts of a system work together an improvement in one helps them all. Improve the file system and everyone benefits immediately and transparently. If every project uses a different file system then there's no continual incremental improvement across the entire stack.

Build self-managing systems that work without having to take the system down. This allows you to more easily rebalance resources across servers, add more capacity dynamically, bring machines off line, and gracefully handle upgrades.

Create a Darwinian infrastructure. Perform time consuming operation in parallel and take the winner.

Don't ignore the Academy. Academia has a lot of good ideas that don't get translated into production environments. Most of what Google has done has prior art, just not prior large scale deployment.

Consider compression. Compression is a good option when you have a lot of CPU to throw around and limited IO.

Click to read more ...

Todd Hoff | 73 Comments | Permalink | Share Article Print Article Email Article

in BigTable, C, Cluster File System, Example, Geo-distributed Clusters, Java, Linux, Map Reduce, Python Tweet

Friday
Nov142008

Private/Public Cloud

Friday, November 14, 2008 at 8:03PM

Data centers are reshaping themselves by taking ideas from public cloud providers, such as Amazon and Google. The idea is to make the data center more cost-effective by enabling on-demand utility-based computing rather than dedicated machines. At the same time, it is clear that to make IT operations more effective, it doesn't make sense to run all the applications that are currently hosted in a company's data center in the private cloud. This calls for an integration between private and public cloud. In this post i discuss some of the challenges involved in making that happen: 1. How do we design applications to be cloud-agnostic? 2. How do we enable seamless fail-over to a public cloud? 3. Future-proofing: There are many cases in which we can't make a clear decision as to where our application should be running at the time of writing or developing the application. We would like to be in a position to change the decision as to where our application will be running even after our application has been completely developed.

Click to read more ...

Todd Hoff | Post a Comment | Permalink | Share Article Print Article Email Article

in Scalability, cloud computing, cohesiveft, gigaspaces, private cloud, security, vpn Tweet

Friday
Nov142008

Useful Cloud Computing Blogs

Friday, November 14, 2008 at 8:55AM

Update 2: Overcast: Conversations on Cloud Computing. Listened to the first two podcasts and they're doing a great job. Worth a look. The singing and dance routines are way over the top however :-) Update: 9 Sources of Cloud Computing News You May Not Know About by James Urquhart. I folded in these recommendations. Can't get enough cloud computing? Then you must really be a glutton for punishment! But just in case, here are some cloud computing resources, collected from various sources, that will help you transform into a Tesla silently flying solo down the diamond lane.
Meta Sources

Cloud Computing Email List: An often lively email list discussing cloud computing.
Cloud Computing Blogs & Resources. An excellent and big list of cloud resources.
Cloud Computing Portal: A community edited database for making the vendor selection process easier.
List of Cloud Platforms, Providers, and Enablers.
datacenterknowledge.com's Recap: More than 70 Industry Blogs : A nice set of blog's for: Data Center, Web Hosting, Content Delivery Network (CDN), Cloud Computing
Cloud Computing Wiki: A cloud computing wiki started by participants of the cloud email list.
Specific Blogs

Cloud Computing on Twitter : Geva Perry's Big List of People Who Twitter About Cloud Computing
Overcast: Conversations on Cloud Computing : Podcast series on cloud computing by James Urquhart and Geva Perry.
James Urquhart's The Wisdom of Clouds : Cloud Computing and Utility Computing for the Enterprise and the Individual. James writes great articles and has a regular can't miss links style post summarizing much of what you need need to know in cloud world.
http://Blog.RightScale.com: Cloud Computing. Delivered.
Randy Bias's Cloudscaling: State of the Art for Startups.
http://elasticserver.blogspot.com/: Elastic Server - CohesiveFT team blog.
Nicholas Carr's Roughtype : Author of The Big Switch: Rewiring the World From Edison to Google.
Christofer Hoff 's Rational Survivability: Ramblings about Information Survivability, Information Centricity, Risk Management and Disruptive Innovation. Oh, I have a fondness for virtualization, too..
Tim Freeman's Virtualization and Grid Computing: Primary developer of the Virtual Workspaces project.
Kent Langley's ProductionScale: Scalable Web Infrastructure and Technology Operations.
Kevin Jackson's Cloud Musings: Personal comments and insight on cloud computing and it relationship to net-centric warfare.
GoGrid Blog: Blog with product and industry news related to Cloud Computing and GoGrid.
John Willis' IT Management and Cloud Blog: Personal comments and podcasts.
Bert Armijo's Head In The Clouds: SVP at 3tera, includes product info as well as comments on industry events
Ross Cooney's SpoutingShite: MD of Rozmic. Cloud computing, email and spam.
TodoOnDemand: Blog about SaaS, Cloud Computing, On Demand Software, Business models, etc...
Jason Meiers' CAM Blog Monitoring composite applications for cloud computing blog.
Sam Johnston: Random rants about tech stuff.
Jian Zhen's and Michael Mucha's On SaaS
Dana Gardner's BriefingsDirect
Cloud - Web and Service Cloud
Virtualization and Grid Computing: On distributed computing, VMs, Globus, Xen, Nimbus, and other technology.
Reuven Cohen's ElasticVapor Blog. The ramblings of Reuven Cohen, co-Founder & CTO Enomaly Inc.
ENKI Blog: Managed Cloud Computing Blog.
Cirrhus9's and M-E Consulting's Working in the Cloud: Cloud computing solutions for the world - or at least for Southern California.
Craig Balding's Cloud Security Blog: This blog is dedicated to Cloud Computing and Security.
Dell's Cloud Computing Blog
Chirag Mehta's Cloud Computing Blog: Architecture, strategy, design, and innovation ramblings.
GigaOm's Infrastructure Blog
Markus Klems' Cloudy Times Blog
Geva Perry's Thinking Out Cloud: Cloud Computing, Grids, Everything-as-a-Service and more.
James Hamilton's Perspective Blog
SearchDataCenter.com’s Server Farming Blog: Discusses the latest in server hardware, systems management, Unix-Linux-Wintel operating systems and large distributed computing systems
William Vambenepe's blog: IT management in a changing IT world
Toon Vanagt's virtualization.com/
Data Center Knowledge: News and analysis about data centers, managed hosting and disaster recovery.
Nati Shalom's Blog: Discussions about middleware and distributed technologies.
Appistry Blogs: At the convergence of Grid Computing, Virtualization and SOA
Avastu's Blog: Sustainable Global Clouds - REAL-TIME MARKET ANALYSIS & RESEARCH ON CLOUD COMPUTING, VIRTUALIZATION, GLOBAL SOURCING, EMERGING TRENDS AND BUSINESS STRATEGIES
Dan Kusnetzky's & Paula Rooney's Virtually Speaking
Phil Wainewright's : Software as Services
Grid Gurus: helping realize the value from cluster, distributed and grid computing.
Joyent's Blog: Cloud computing vendor.
Grid Designer's Blog: Consulting firm specializing exclusively in "extreme" applications and systems.
Rob Thorsten's Why Amazon’s RightScale Blog: Primarily talks about Amazon, but there's a lot of good general cloud info too.
On-Demand Enterprise: tracks the greater on-demand world beyond.
Google Alerts: "Cloud Computing" | "Utility Computing"
Jian Zhen's and Michael Mucha's cloudfeed.net: An automated feed of cloud computing and SaaS related stories.
On-Demand Enterprise's Cloud Computing Topic: Excellent coverage in the vendor coverage, traditional enterprise data center software. and virtualization space.
Avastu Blog: Sustainable Global Clouds: REAL-TIME MARKET ANALYSIS & RESEARCH ON CLOUD COMPUTING, FINANCIAL MARKETS, VIRTUALIZATION, GLOBAL SOURCING, EMERGING TRENDS AND BUSINESS STRATEGIES.
TechCrunchIT: dedicated to obsessively profiling products and companies in the Enterprise Technology space. Know any other good blog's that should be on this list?

Click to read more ...

Todd Hoff | 49 Comments | Permalink | Share Article Print Article Email Article

in Blog, cloud Tweet

Thursday
Nov132008

Plenty of Fish Says Scaling for Free Doesn't Pay

Thursday, November 13, 2008 at 1:39PM

Plenty of FishCEO Markus Frind, famous nerd hero for making over $10 million a year from Google ads on a free dating site he made and ran all by himself, now sees a problem with the free model:
The problem with free is that every time you double the size of your database the cost of maintaining the site grows 6 fold. I really underestimated how much resources it would take, I have one database table now that exceeds 3 billion records. The bigger you get as a free site the less money you make per visit and the more it costs to service a visit...There is really no money in being free and we have to start experimenting with other models now or we won’t be able to compete in 3 or 4 years.
As one commenter succinctly put it: the “golden time” of AdSense is over. Time to look at costs. The POF architecture is to run scarily huge tables on single machines. They also buy and maintain their own SAN. So it seems scaling up is what is increasing costs and decreasing profits. I wonder if the economics of cloud storage and cloud architectures might have a more linear cost curve?

Click to read more ...

Todd Hoff | 22 Comments | Permalink | Share Article Print Article Email Article

in Strategy, cloud Tweet

Page 1 ... 10 11 12 13 14 ... 38 Next 10 Entries »