High Scalability -

Entries in Paper (127)

Monday

Dec292008

Paper: Spamalytics: An Empirical Analysisof Spam Marketing Conversion

Monday, December 29, 2008 at 1:39AM

Under the philosophy that the best method to analyse spam is to become a spammer, this absolutely fascinating paper recounts how a team of UC Berkely researchers went under cover to infiltrate a spam network. Part CSI, part Mission Impossible, and part MacGyver, the team hijacked the botnet so that their code was actually part of the dark network itself. Once inside they figured out the architecture and protocols of the botnet and how many sales they were able to tally. Truly elegant work.

Two different spam campaigns were run on a Storm botnet network of 75,800 zombie computers. Storm is a peer-to-peer botnet that uses spam to creep its tentacles through the world wide computer network. One of the campains distributed viruses in order to recruit new bots into the network. This is normally accomplished by enticing people to download email attachments. An astonishing one in ten people downloaded the executable and ran it, which means we won't run out of zombies soon. The downloaded components include: Backdoor/downloader, SMTP relay, E-mail address stealer, E-mail virus spreader, Distributed denial of service (DDos) attack tool, pdated copy of Storm Worm dropper. The second campaign sent pharmacuticle spam ("libido boosting herbal remedy”) over the network.

Haven't you always wondered who clicks on spam and how much could spammers possibly make? In the study only 28 sales resulted from 350 million spam e-mail messages sent over 26 days. A conversion rate of well under 0.00001% (typical advertising campaign might have a conversion of 2-3%). The average purchase price was about $100 for $2,731.88 in total revenue. The reserchers estimate total daily revenue attributable to Storm’s pharmacy campaign is about $7000 and that they pick up between 3500 and 8500 new bots per day through their Trojan distribution system. And this is with only 1.5% of the entire network in use.

So, the spammers would take in total revenue about $3.5 million a year from one product from one network. Imagine the take with multiple products and multiple networks? That's why we still have spam. And since the conversion rate is already so low, it seems spam will always be with us.

As fascinating as all the spamonomics are, the explanation of the botnet architecture is just as fascinating. Storm uses a three-level self-organizing hierarchy pictured here:

worker bots - make requests for work and upon receiving orders send spam as requested. Works pull work from higher layers.

proxy bots - act as coordinators between workers and master servers.

master servers - send commands to the workers and receive their status reports. There are small number of master servers hosted at “bullet-proof” hosting centers and are likely directly managed by the botmaster.

A host selects its worker or proxy role automatically. If a firewall doesn't prevent inbound communication the infected host becomes a proxy, otherwise the host becomes a worker. As workers pull work from proxies there's no need to contact one directly. Proxies on the other hand are directly contacted by master servers so communication must be bidirectional.

Storm communicates using two separate protocols:

An encrypted version of the UDP-based Overnet protocol and is used primarily as a directory service to find other nodes. Overnet is a peer-to-peer protocol that uses a distributed hash table mechanism to find peers.

A custom TCP-based protocol for masters sending command and control commands to proxies and workers. Command and control traffic to the worker bots is unecrypted which makes a man-in-the-middle attack possible and is how the researchers carried out their caper.

According to Brandon Enright: When a peer wants to find content in the network, it computes (or is given) the hash of that content and then searches adjacent peers. Those peers respond with their adjacent peers that are closer. This is repeated until the searching peer gets close enough to the content that a node there will be able to provide a search result. This is a complicated and interesting process that the Spamalytics paper goes into in a lot more detail on as do some references at the end of this post.

Storm harnesses a large, unreliable, constantly changing distributed system to do work. It's an architecture worth learning from and we'll explore some of those lessons in a later post.

On the Spam Campaign Trail

Scaling Spam Eradication Using Purposeful Games: Die Spammer Die!

Can cloud computing smite down evil zombie botnet armies?

Inside the Storm: Protocols and Encryption of the Storm Botnet by Joe Stewart, GCIG Director of Malware Research, SecureWorks

Exposing Stormworm by Brandon Enright. A lot of excellent low level protocol details.

Storm Botnet

Global Guerrillas by John Robb - Networked tribes, systems disruption, and the emerging bazaar of violence. Resilient Communities, decentralized platforms, and self-organizing futures.

Todd Hoff |

1 Comment |

Permalink |

Print Article

Email Article

Paper,

distirbuted,

spam

Saturday

Dec062008

Paper: Real-world Concurrency

Saturday, December 6, 2008 at 2:27AM

An excellent article by Bryan Cantrill and Jeff Bonwick on how to write multi-threaded code. With more processors and no magic bullet solution for how to use them, knowing how to write multiprocessor code that doesn't screw up your system is still a valuable skill. Some topics:

Know your cold paths from your hot paths.

Intuition is frequently wrong—be data intensive.

Know when—and when not—to break up a lock.

Be wary of readers/writer locks.

Consider per-CPU locking.

Know when to broadcast—and when to signal.

Learn to debug postmortem.

Design your systems to be composable.

Don't use a semaphore where a mutex would suffice.

Consider memory retiring to implement per-chain hash-table locks.

Be aware of false sharing.

Consider using nonblocking synchronization routines to monitor contention.

When reacquiring locks, consider using generation counts to detect state change.

Use wait- and lock-free structures only if you absolutely must.

Prepare for the thrill of victory—and the agony of defeat. While I don't agree that code using locks can be made composable, this articles covers a lot of very useful nitty-gritty details that will up your expert rating a couple points.

Click to read more ...

Todd Hoff |

5 Comments |

Permalink |

Print Article

Email Article

Paper,

concurrency

Friday

Nov142008

Paper: Pig Latin: A Not-So-Foreign Language for Data Processing

Friday, November 14, 2008 at 1:05AM

Yahoo has developed a new language called Pig Latin that fit in a sweet spot between high-level declarative querying in the spirit of SQL, and low-level, procedural programming `a la map-reduce and combines best of both worlds. The accompanying system, Pig, is fully implemented, and compiles Pig Latin into physical plans that are executed over Hadoop, an open-source, map-reduce implementation. Pig has just graduated from the Apache Incubator and joined Hadoop as a subproject. The paper has a few examples of how engineers at Yahoo! are using Pig to dramatically reduce the time required for the development and execution of their data analysis tasks, compared to using Hadoop directly. References: Apache Pig Wiki

Click to read more ...

geekr |

Post a Comment |

Permalink |

Print Article

Email Article

Hadoop,

Map Reduce,

Paper,

pig,

pig latin,

yahoo

Friday

Oct172008

A High Performance Memory Database for Web Application Caches

Friday, October 17, 2008 at 1:22AM

Abstract—This paper presents the architecture and characteristics of a memory database intended to be used as a cache engine for web applications. Primary goals of this database are speed and efficiency while running on SMP systems with several CPU cores (four and more). A secondary goal is the support for simple metadata structures associated with cached data that can aid in efficient use of the cache. Due to these goals, some data structures and algorithms normally associated with this field of computing needed to be adapted to the new environment.

Click to read more ...

Todd Hoff |

2 Comments |

Permalink |

Print Article

Email Article

Caching,

Paper

Monday

Oct062008

Paper: Scaling Genome Sequencing - Complete Genomics Technology Overview

Monday, October 6, 2008 at 10:05PM

Although the problem of scaling human genome sequencing is not exactly about building bigger, faster and more reliable websites it is most interesting in terms of scalability. The paper describes a new technology by the startup company Complete Genomics to sequence the full human genome for the fraction of the cost of earlier possibilities. Complete Genomics is building the world’s largest commercial human genome sequencing center to provide turnkey, outsourced complete human genome sequencing to customers worldwide. By 2010, their data center will contain approximately 60,000 processors with 30 petabytes of storage running their sequencing software on Linux clusters. Do you find this interesting and relevant to HighScalability.com?

Click to read more ...

geekr |

6 Comments |

Permalink |

Print Article

Email Article

Paper,

Scalability

Sunday

Oct052008

Paper: Scalability Design Patterns

Sunday, October 5, 2008 at 9:35PM

I have introduced pattern languages in my earlier post on The Pattern Bible for Distributed Computing. Achieving highest possible scalability is a complex combination of many factors. This PLoP 2007 paper presents a pattern language that can be used to make a system highly scalable. The Scalability Pattern Language introduced by Kanwardeep Singh Ahluwalia includes patterns to:

Introduce Scalability
Optimize Algorithm
Add Hardware
Add Parallelism
- Add Intra-Process Parallelism
- Add Inter-Porcess Parallelism
- Add Hybrid Parallelism
Optimize Decentralization
Control Shared Resources
Automate Scalability

Click to read more ...

geekr |

Post a Comment |

Permalink |

Print Article

Email Article

Paper,

Scalability,

pattern

Monday

Sep222008

Paper: On Delivering Embarrassingly Distributed Cloud Services

Monday, September 22, 2008 at 2:31AM

How do we scale datacenters? Should we build a few mammoth million machine datacenters or many smaller micro datacenters? Intuitively we usually go with a bigger is better economies of scale type argument, but it may not be so. What works for Walmart may not work for White Box World. Mega datacenters may actually exhibit diseconomies of scale. It may be better to run applications over many distributed micro datacenters instead of one large one. This paper by Ken Church, Albert Greenberg, and James Hamilton, all from Microsoft, takes a look at the different issues and concludes:

Putting it all together, the micro model offers a design point with attractive performance, reliability, scale and cost. Given how much the industry is currently investing in the mega model, the industry would do well to consider the micro alternative.

Embarrasingly Distributed Cloud Services by James Hamilton

Diseconomies of Scale by James Hamilton.

Architecture for Modular Datacenters by James Hamilton.

Enterprise Data Center Design and Methodology by Rob Snevely. Enterprise Data Center Design and Methodology is a practical guide to designing a data center from inception through construction. The fundamental design principles take a simple, flexible, and modular approach based on accurate, real-world requirements and capacities. This approach contradicts the conventional (but totally inadequate) method of using square footage to determine basic capacities like power and cooling requirements.

Click to read more ...

Todd Hoff |

Post a Comment |

Permalink |

Print Article

Email Article

Paper,

cloud

Saturday

Aug302008

Paper: GargantuanComputing—GRIDs and P2P

Saturday, August 30, 2008 at 3:03AM

I found the discussion of the available bandwidth of tree vs higher dimensional virtual networks topologies quite, to quote Spock, fascinating: A mathematical analysis by Ritter (2002) (one of the original developers of Napster) presented a detailed numerical argument demonstrating that the Gnutella network could not scale to the capacity of its competitor, the Napster network. Essentially, that model showed that the Gnutella network is severely bandwidth-limited long before the P2P population reaches a million peers. In each of these previous studies, the conclusions have overlooked the intrinsic bandwidth limits of the underlying topology in the Gnutella network: a Cayley tree (Rains and Sloane 1999) (see Sect. 9.4 for the definition). Trees are known to have lower aggregate bandwidth than higher dimensional topologies, e.g., hypercubes and hypertori. Studies of interconnection topologies in the literature have tended to focus on hardware implementations (see, e.g., Culler et al. 1996; Buyya 1999), which are generally limited by the cost of the chips and wires to a few thousand nodes. P2P networks, on the other hand, are intended to support from hundreds of thousands to millions of simultaneous peers, and since they are implemented in software, hyper-topologies are relatively unfettered by the economics of hardware. In this chapter, we analyze the scalability of several alternative topologies and compare their throughput up to 2–3 million peers. The virtual hypercube and the virtual hypertorus offer near-linear scalable bandwidth subject to the number of peer TCP/IP connections that can be simultaneously kept open.

Click to read more ...

Todd Hoff |

Post a Comment |

Permalink |

Print Article

Email Article

P2P,

Paper

Sunday

Aug242008

A Scalable, Commodity Data Center Network Architecture

Sunday, August 24, 2008 at 6:16AM

Looks interesting... Abstract: Today’s data centers may contain tens of thousands of computers with significant aggregate bandwidth requirements. The network architecture typically consists of a tree of routing and switching elements with progressively more specialized and expensive equipment moving up the network hierarchy. Unfortunately, even when deploying the highest-end IP switches/routers, resulting topologies may only support 50% of the aggregate bandwidth available at the edge of the network, while still incurring tremendous cost. Nonuniform bandwidth among data center nodes complicates application design and limits overall system performance. In this paper, we show how to leverage largely commodity Ethernet switches to support the full aggregate bandwidth of clusters consisting of tens of thousands of elements. Similar to how clusters of commodity computers have largely replaced more specialized SMPs and MPPs, we argue that appropriately architected and interconnected commodity switches may deliver more performance at less cost than available from today’s higher-end solutions. Our approach requires no modifications to the end host network interface, operating system, or applications; critically, it is fully backward compatible with Ethernet, IP, and TCP.

Click to read more ...

Todd Hoff |

2 Comments |

Permalink |

Print Article

Email Article

Paper,

datacenter,

network

Saturday

Jul262008

Google's Paxos Made Live – An Engineering Perspective

Saturday, July 26, 2008 at 1:31AM

This is an unusually well written and useful paper. It talks in detail about experiences implementing a complex project, something we don't see very often. They shockingly even admit that creating a working implementation of Paxos was more difficult than just translating the pseudo code. Imagine that, programmers aren't merely typists! I particularly like the explanation of the Paxos algorithm and why anyone would care about it, working with disk corruption, using leases to support simultaneous reads, using epoch numbers to indicate a new master election, using snapshots to prevent unbounded logs, using MultiOp to implement database transactions, how they tested the system, and their openness with the various problems they had. A lot to learn here. From the paper: We describe our experience building a fault-tolerant data-base using the Paxos consensus algorithm. Despite the existing literature in the field, building such a database proved to be non-trivial. We describe selected algorithmic and engineering problems encountered, and the solutions we found for them. Our measurements indicate that we have built a competitive system. Introduction It is well known that fault-tolerance on commodity hardware can be achieved through replication [17, 18]. A common approach is to use a consensus algorithm [7] to ensure that all replicas are mutually consistent [8, 14, 17]. By repeatedly applying such an algorithm on a sequence of input values, it is possible to build an identical log of values on each replica. If the values are operations on some data structure, application of the same log on all replicas may be used to arrive at mutually consistent data structures on all replicas. For instance, if the log contains a sequence of database operations, and if the same sequence of operations is applied to the (local) database on each replica, eventually all replicas will end up with the same database content (provided that they all started with the same initial database state). This general approach can be used to implement a wide variety of fault-tolerant primitives, of which a fault-tolerant database is just an example. As a result, the consensus problem has been studied extensively over the past two decades. There are several well-known consensus algorithms that operate within a multitude of settings and which tolerate a variety of failures. The Paxos consensus algorithm [8] has been discussed in the theoretical [16] and applied community [10, 11, 12] for over a decade. We used the Paxos algorithm (“Paxos”) as the base for a framework that implements a fault-tolerant log. We then relied on that framework to build a fault-tolerant database. Despite the existing literature on the subject, building a production system turned out to be a non-trivial task for a variety of reasons: While Paxos can be described with a page of pseudo-code, our complete implementation contains several thousand lines of C++ code. The blow-up is not due simply to the fact that we used C++ instead of pseudo notation, nor because our code style may have been verbose. Converting the algorithm into a practical, production-ready system involved implementing many features and optimizations – some published in the literature and some not. • The fault-tolerant algorithms community is accustomed to proving short algorithms (one page of pseudo code) correct. This approach does not scale to a system with thousands of lines of code. To gain confidence in the “correctness” of a real system, different methods had to be used. • Fault-tolerant algorithms tolerate a limited set of carefully selected faults. However, the real world exposes software to a wide variety of failure modes, including errors in the algorithm, bugs in its implementation, and operator error. We had to engineer the software and design operational procedures to robustly handle this wider set of failure modes. • A real system is rarely specified precisely. Even worse, the specification may change during the im- plementation phase. Consequently, an implementation should be malleable. Finally, a system might “fail” due to a misunderstanding that occurred during its specification phase. This paper discusses a selection of the algorithmic and engineering challenges we encountered in moving Paxos from theory to practice. This exercise took more R&D efforts than a straightforward translation of pseudo-code to C++ might suggest. The rest of this paper is organized as follows. The next two sections expand on the motivation for this project and describe the general environment into which our system was built. We then provide a quick refresher on Paxos. We divide our experiences into three categories and discuss each in turn: algorithmic gaps in the literature, software engineering challenges, and unexpected failures. We conclude with measurements of our system, and some broader observations on the state of the art in our field.

ZooKeeper - A Reliable, Scalable Distributed Coordination System

Click to read more ...

Todd Hoff |

2 Comments |

Permalink |

Print Article

Email Article

Paper,

paxos