High Scalability -

Permalink |

hot links

Wednesday

Jan272010

Hot Scalability Links for January 28 2010

Wednesday, January 27, 2010 at 9:52PM

Google's Research Areas of Interest: Building scalable, robust cluster applications. At Google we see distributed systems as a technology in its infancy, with huge gaps in the supporting research that represent some of the most important problems in the space. Here are some examples: Resource sharing, Balancing cost, performance, and reliability, Self-maintaining systems.
Amazon SimpleDB: A Simple Way to Store Complex Data by Paul Tremblett. The most effective way I have found to understand SimpleDB is to think about it in terms of something else we all use and understand -- a spreadsheet.
Rackspace Cloud Servers versus Amazon EC2: Performance Analysis. The Bitsource conducted a review of the two cloud computing platforms, Rackspace Cloud Servers and Amazon Elastic Compute Cloud (EC2), to get a general idea of overall system performance.
Private Clouds Are Not The Future by Jame Hamilton. Private clouds are better than nothing but an investment in a private cloud is an investment in a temporary fix that will only slow the path to the final destination: shared clouds.
What is the right way to measure scale? by Daniel Abadi. So which scales better? Is using the number of nodes a better proxy than size of data? Hadoop can “scale” to 3800 nodes. So far, all we know is that Greenplum can “scale” to 96 nodes. Can it handle more nodes?

Click to read more ...

Permalink |

hot links

Wednesday

Jan132010

10 Hot Scalability Links for January 13, 2010

Wednesday, January 13, 2010 at 7:31AM

Has Amazon EC2 become over subscribed? by Alan Williamson. Systemic problems hit AWS as users experience problems across Amazon's infrastructure. It seems the strange attractor of a cloud may be the same as for a shared hosting service.

Understanding Infrastructure 2.0 by James Urquhart. We need to take a systems view of our entire infrastructure, and build our automation around the end-to-end architecture of that system.

Hey You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds. We show that it is possible to map the internal cloud infrastructure.

Hadoop World: Building Data Intensive Apps with Hadoop and EC2 by Pete Skomoroch. Dives into detail about how he built TrendingTopics.org using Hadoop and EC2.

A Crash Course in Modern Hardware by Cliff Click. Yes, your mind will hurt after watching this. And no, you probably don't know what your microprocessor is doing anymore.

Click to read more ...

Permalink |

hot links

Monday

Dec212009

Hot Holiday Scalability Links for 2009

Monday, December 21, 2009 at 7:25AM

Building Super Scalable Systems: Blade Runner Meets Autonomic Computing in the Ambient Cloud. The only independent platform most of us will have access to capable of hosting planet-scale applications is the Ambient Cloud. It forms a sort of digital potluck where everyone contributes memory, network, and other compute resources from whatever they happen to have available.
Top 10 Internet Startup Scalability Killers. Strategies taken from The Art of Scalability. 1. Thinking Scalability Is Just About Technology; 2. Overuse of Synchronous Calls; 3. Failure to Weed or Seed Soon Enough; 4. Inappropriate Use of Databases; 5. Cesspools Instead of Swim Lanes; 6. Reliance on Vertical Scale; 7. Failure to Learn from History; 8. Changing Development Methodologies to Fix Problems; 9. Too Little Caching, Too Late; 10. Overreliance on Third Parties to Scale.
The New Google: Internet Giant Opens Up About Real-Time and Local Search, Cloud Computing, and Data Liberation. In four separate interviews, Google delved into some of the most important topics of the day, from its advances in real-time and local search to cloud computing and a “data.. liberation” effort to help consumers export their files and digital information from Google products
Ask HN: What are the best technologies you've worked with this year? Quite a nice variety, no clear winner. Some honorable mentions: Django, Redis, Clojure, XMPP, Node.js, AMQP, Rails, jQuery, Solr, Hadoop.
Why I think Mongo is to Databases what Rails was to Frameworks. We have been amazed at how much code we cut out of Harmony with the switch from MySQL to Mongo.
A Deluge of Data Shapes a New Era in Computing. Dr. Gray called the shift a “fourth paradigm.” The first three paradigms were experimental, theoretical and, more recently, computational science. He explained this paradigm as an evolving era in which an “exaflood” of observational data was threatening to overwhelm scientists.
MySpace Replaces Storage with Solid-State Drive Technology in 150 Standard Load Servers. Using SSD reduced the headcount of their heavy load servers from 80 to 30.
Amazon's CloudFront Now Offers Flash Streaming, This Will Disrupt The Market. Amazon does have the potential to take many of the mid-sized customers who spend between $3-5k a month on video.
How would you design an AppEngine datastore for a social site like Twitter? Using Jaiku's reimplementation on Google App Engine is a good reference.
Query Processing for NOSQL DB. It seems to me that the responsibility of building an indexing and query mechanism lands on the NoSQL user.
Persistent Trees in git, Clojure and CouchDB. There are some really neat software projects emerging at the moment, and as a developer I always find it interesting to take a look at the implementation details, because there is often a lot to be learned.
Transcendent Memory. Transcendent memory is a new memory-management technique which, it is hoped, will improve the system's use of scarce RAM, regardless of whether virtualization is being used.
Trading Shares in Milliseconds. By the end of the day, his computers will have bought and sold about 60 million to 80 million shares.

Permalink |

hot links

Wednesday

Nov112009

Hot Scalability Links for Nov 11 2009

Wednesday, November 11, 2009 at 8:03AM

The Cost of Latency by James Hamilton. James summarizes latency info from Steve Souder, Greg Linden, and Marissa Mayer. Speed [is] an undervalued and under-discussed asset on the web.
Dynamo - Part I: a followup and re-rebuttals. Dynamo under attack as having Design flaws and the resounding rebuttal in response.
Programming Bits and Atoms. Thinking about programming and scaling as a problem in physics. Absolutely fascinating and inspiring.
Scaling Servers with the Cloud: Amazon S3. Build a static site using S3 for pennies. An oldly but still a goody idea.
Are Wireless Road Trains the Cure for Traffic Congestion? The concept of road trains--up to eight vehicles zooming down the road together--has long been considered a faster, safer, and greener way of traveling long distances by car.
Erlang at Facebook by Eugene Letuchy. How Facebook uses Erlang to implement Chat, AIM Presence, and Chat Jabber support.
Yahoo Open Sources Traffic Server. Traffic Server enables the session management, authentication, configuration management, load balancing, and routing for an entire cloud computing stack.
How Complex Systems Fail by Richard Cook. Being a Short Treatise on the Nature of Failure; How Failure is Evaluated; How Failure is Attributed to Proximate Cause; and the Resulting New Understanding of Patient Safety
Heroku vs EngineYard Cloud vs Joyent by Eliot Sykes. Rails hosting options head-to-head.

Permalink |

hot links

Friday

Oct302009

Hot Scalabilty Links for October 30 2009

Friday, October 30, 2009 at 6:58AM

Life beyond Distributed Transactions: an Apostate’s Opinion by Pat Helland. In particular, we focus on the implications that fall out of assuming we cannot have large-scale distributed transactions.
Tragedy of the Commons, and Cold Starts - Cold application starts on Google App Engine kill your application's responsiveness.
Intel’s 1M IOPS desktop SSD setup by Kevin Burton. What do you get when you take 7 Intel SSDs and throw them in a desktop? 1M IOPS
Videos from NoSQL Berlin sessions. Nicely done talks on CAP, MongoDB, Redis, 4th generation object databases, CouchDB, and Riak.
Designs, Lessons and Advice from Building Large Distributed Systems by Jeff Dean of Google describing how they do their thing. Here are some glosses on the talk by Greg Linden and James Hamilton. You really can't do better than Greg and James.
- Advice from Google on Large Distributed Systems by Greg Linden. A nice summary of Jeff Dean's talk. A standard Google server appears to have about 16G RAM and 2T of disk; Things will crash. Deal with it!; When designing for scale, you should design for expected load, ensure it still works at x10, but don't worry about scaling to x100.
- Jeff Dean: Design Lessons and Advice from Building Large Scale Distributed Systems by James Hamilton. A data center wide storage hierarchy; Failure Inevitable; Excellent set of distributed systems rules of thumb; Typical first year for a new cluster; GFS Usage at Google; Working on next generation Big Table system called Spanner.

2 Comments |

Permalink |

google,

hot links,

nosql

Thursday

Oct152009

Hot Scalability Links for Oct 15 2009

Thursday, October 15, 2009 at 9:22AM

Update: Social networks in the database: using a graph database. Anders Nawroth puts graphs through their paces by representing, traversing, and performing other common social network operations using a graph database.

Update: Deployment with Capistrano by Charles Max Wood. Simple step-by-step for using Capistrano for deployment.

Log-structured file systems: There's one in every SSD by Valerie Aurora. SSDs have totally changed the performance characteristics of storage! Disks are dead! Long live flash!

An Engineer's Guide to Bandwidth by DGentry. It's a rough world out there, and we need to to a better job of thinking about and testing under realistic network conditions.

Analyzing air traffic performance with InfoBright and MonetDB by Vadim of the MySQL Performance Blog.

Scalable Delivery of Stream Query Result by Zhou, Y ; Salehi, A ; Aberer, K. In this paper, we leverage Distributed Publish/Subscribe System (DPSS), a scalable data dissemination infrastructure, for efficient stream query result delivery.

Permalink |

hot links

Thursday

Sep172009

Hot Links for 2009-9-17

Thursday, September 17, 2009 at 4:11AM

Save 25% on Hadoop Conference Tickets
Apache Hadoop is a hot technology getting traction all over the enterprise and in the Web 2.0 world. Now, there's going to be a conference dedicated to learning more about Hadoop. It'll be Friday, October 2 at the Roosevelt Hotel in New York City.

Hadoop World, as it's being called, will be the first Hadoop event on the east coast. Morning sessions feature talks by Amazon, Cloudera, Facebook, IBM, and Yahoo! Then it breaks out into three tracks: applications, development / administration, and extensions / ecosystems. In addition to the conference itself, there will also be 3 days of training prior to the event for those looking to go deeper. In addition to general sessions speakers, presenters include Hadoop project creator Doug Cutting, as well as experts on large-scale data from Intel, Rackspace, Softplayer, eHarmony, Supermicro, Impetus, Booz Allen Hamilton, Vertica, About.com, and other companies.

Readers get a 25% discount if you register by Sept. 21: http://hadoop-world-nyc.eventbrite.com/?discount=hadoopworld_promotion_highscalability.

Essential storage tradeoff: Simple Reads vs. Simple Writes by Stephan Schmidt. Data in denormalized chunks is easy to read and complex to write.

Kickfire's approach to parallelism by DANIEL ABADI. Kickfire uses column-oriented storage and execution to address I/O bottlenecks and FPGA-based data-flow architecture to address processing and memory bottlenecks.

"Just in Time" Decompression in Analytic Databases by Michael Stonebraker. A DBMS that is optimized for compression through and through--especially with a query executor that features just in time decompression will not just reduce IO and storage overhead, but also offer better query performance with lower CPU resource utilization.

Reverse Proxy Performance – Varnish vs. Squid (Part 2) by Bryan Migliorisi. My results show that in raw cache hit performance, Varnish puts Squid to shame.

Building Scalable Databases: Denormalization, the NoSQL Movement and Digg by Dare Obasanjo. As a Web developer it's always a good idea to know what the current practices are in the industry even if they seem a bit too crazy to adopt…yet.

How To Make Life Suck Less (While Making Scalable Systems) by Bradford Stephens. Scalable doesn’t imply cheap or easy. Just cheaper and easier.

Some perspective to this DIY storage server mentioned at Storagemojo by by Joerg Moellenkamp. It's about making decision. Application and hardware has to be seen as one. When your application is capable to overcome the limitations and problems of such ultra-cheap storage

Todd Hoff |

1 Comment |

Permalink |

Confronting the Data Center Crisis: A Cost - Benefit Analysis of the IBM
Computing on Demand (CoD) Cloud Offering

hot links

Friday

Sep042009

Hot Links for 2009-9-4

Friday, September 4, 2009 at 1:10AM

A tour through hybrid column/row-oriented DBMS schemes by DANIEL ABADI. Approaches: PAX, Fractured Mirrors, and Fine-grained hybrids.

The Future of Database Clustering by ROBERT HODGES. Simple management and monitoring, Fast, flexible replication, Top-to-bottom data protection, Partition management, Cloud and virtualized operation, Transparent application access, Open source.

Some perspective to this DIY storage server mentioned at Storagemojo by Joerg Moellenkamp. Quality costs. Period.

Turn up the volume: API Scalability with Caching by Scott.

Disk I/O Bottlenecks by Ryan Thiessen. My first approach to diagnosing a performance problem is to start by trying to find the system’s bottleneck.

Patterns for Cloud Computing by Simon Guest. Using the Cloud for Scale, Using the Cloud for Multi-Tenancy, Using the Cloud for Compute, Using the Cloud for Storage, Using the Cloud for Communications

Server Processor Roadmaps Show Change in Direction By Michael J. Miller. What fascinates me is the big change in direction we're seeing on server chips...The focus seemed to be on putting more cores on a chip, something we're still seeing with these new 8-, 12-, and 16-core chips. But now a lot of focus seems to be going into increasing memory bandwidth and new cache architectures, as designers are addressing the memory issues that are often the bottleneck in a multicore system, as well as core-to-core communications.

Azul's Experiences With Hardware / Software Co-Design by Dr. Cliff Click. Owning whole stack allows progress, Some really hard HW problems “solved” in SW, GC is “solved” w/HW Read Barrier, Simple HTM can do Lock Elision, Huge count of simple cores really useful in production.

Java Memory Problems - Memory problems in Java applications are manifold und easily lead to performance and scalability problems. Especially in J EE applications with a high number of parallel users memory management must be a central part of the application architecture.

Noob question: how do you [Reddit] join on so much data?

Transactional Memory versus Locks -
A Comparative Case Study by Victor Pankratius. TM alone is no silver bullet.

Looking at Redis by Peter Zaitsev. With Redis I got about 3 times more updates/sec – close to 100.000 updates/sec with about 1.5 core being used.

The fantasy sponsor for this post are those little food kiosks outside Home Depot stores. I love their Fire Dogs. Hot and yummy. I bet most home improvement projects in America are inspired by cravings for one of these little beauties.

Todd Hoff |

1 Comment |

Permalink |

The Big Cheese: Powerful Version Of Google Search Appliance Can Grow Exponentially.

hot links

Wednesday

Aug262009

Hot Links for 2009-8-26

Wednesday, August 26, 2009 at 1:35AM

I'm Going To Scale My Foot Up Your Ass - Shut up about scalability, no one is using your app anyway.

Multi-Tenant Data Architecture - Microsoft's take on different approaches to multitenancy.

Cloud computing rides on spiraling Energy costs - A report by US researchers has shown the increasing cost of power and cooling in the data centre is a driver towards cloud computing.

Interview: Apple’s Gigantic New Data Center Hints at Cloud Computing - Companies building centers this big are getting into cloud computing. Running apps in the cloud requires massive infrastructure: Google-size infrastructure.

What Does Cloud Computing Actually Cost? An Analysis of the Top Vendors - Amazon is currently the lowest cost cloud computing option overall. At least for production applications that need more than 6.5 hours of CPU/day, otherwise GAE is technically cheaper because it's free until this usage level.

no:sql(east) - October 28–30, 2009, Atlanta, GA. Very cute page playing off of SQL syntax.

New Products and Updates

Gear6 Web Cache Virtual Appliance - a feature complete virtual machine (VM) of the Gear6 Web Cache software. It includes all the functionality of the Gear6 Web Cache including simulating Gear6 high density RAM-flash architecture.

Seamlessly Extending the Data Center - Introducing Amazon Virtual Private Cloud (VPC) - We have developed Amazon VPC to allow our customers to seamlessly extend their IT infrastructure into the cloud while maintaining the levels of isolation required for their enterprise management tools to do their work.

NetApp reveals cloud computing plan, new Data OnTap OS - Our research shows users are very interested in scale-out technology," she said. "What's nice about it is as you add processor and storage resources, you get much higher storage utilization rates and the new scale-out system grows up to 14 petabytes, but it can still be managed in a single array.

Updates to Articles on High Scalability

Streamy Explains CAP and HBase's Approach to CAP - We plan to employ inter-cluster replication, with each cluster located in a single DC. Remote replication will introduce some eventual consistency into the system, but each cluster will continue to be strongly consistent. Updated: How Google Serves Data from Multiple Datacenters.

The fantasy sponsor for this post are those little food kiosks outside Home Depot stores. I love their Fire Dogs. Hot and yummy. I bet most home improvement projects in America are inspired by cravings for one of these little beauties.

Todd Hoff |

1 Comment |

Permalink |