Entries in scaling (12)

Wednesday
Jul092014

Using SSD as a Foundation for New Generations of Flash Databases - Nati Shalom

“You just can't have it all” is a phrase that most of us are accustomed to hearing and that many still believe to be true when discussing the speed, scale and cost of processing data. To reach high speed data processing, it is necessary to utilize more memory resources which increases cost. This occurs because price increases as memory, on average, tends to be more expensive than commodity disk drive. The idea of data systems being unable to reliably provide you with both memory and fast access—not to mention at the right cost—has long been debated, though the idea of such limitations was cemented by computer scientist, Eric Brewer, who introduced us to the CAP theorem.

The CAP Theorem and Limitations for Distributed Computer Systems

Click to read more ...

Monday
Apr042011

Scaling Social Ecommerce Architecture Case study

A recent study showed that over 92 percent of executives from leading retailers are focusing their marketing efforts on Facebook and subsequent applications. Furthermore, over 71 percent of users have confirmed they are more likely to make a purchase after “liking” a brand they find online. (source)

Sears Architect Tomer Gabel provides an insightful overview on how they built a Social Ecommerce solution for Sears.com that can handle complex relationship quires in real time. Tomer goes through:

  • the architectural considerations behind their solution
  • why they chose memory over disk
  • how they partitioned the data to gain scalability
  • why they chose to execute code with the data using GigaSpaces Map/Reduce execution framework
  • how they integrated with Facebook
  • why they chose GigaSpaces over Coherence and Terracotta for in-memory caching and scale

In this post I tried to summarize the main takeaway from the interview.

You can also watch the full interview (highly recomended).

Read the full story here

Sunday
Apr262009

Scale-up vs. Scale-out: A Case Study by IBM using Nutch/Lucene

Scale-up solutions in the form of large SMPs have represented the mainstream of commercial computing for the past several years. The major server vendors continue to provide increasingly larger and more powerful machines. More recently, scale-out solutions, in the form of clusters of smaller machines, have gained increased acceptance for commercial computing.
Scale-out solutions are particularly effective in high-throughput web-centric applications. In this paper, we investigate the behavior of two competing approaches to parallelism, scale-up and scale-out, in an emerging search application. Our conclusions show that a scale-out strategy can be the key to good performance even on a scale-up machine.
Furthermore, scale-out solutions offer better price/performance, although at an increase in management complexity.

Read more about scaling out/up and about the conclusions here (PDF - you can also download it)

Click to read more ...

Wednesday
Mar112009

Classifying XTP systems and how cloud changes which type startups will use

I try to group XTP in to two main groups, type 1 and 2 and then subdivide type 2 in to 2a and 2b. I describe how I do this grouping and then amplify it a little in the context of cloud services.

Click to read more ...

Sunday
Dec282008

How to Organize a Database Table’s Keys for Scalability

The key (no pun intended) to understanding how to organize your dataset’s data is to think of each shard not as an individual database, but as one large singular database. Just as in a normal single server database setup where you have a unique key for each row within a table, each row key within each individual shard must be unique to the whole dataset partitioned across all shards. There are a few different ways we can accomplish uniqueness of row keys across a shard cluster. Each has its pro’s and con’s and the one chosen should be specific to the problems you’re trying to solve.

Click to read more ...

Sunday
Dec212008

The I.H.S.D.F. Theorem: A Proposed Theorem for the Trade-offs in Horizontally Scalable Systems

Successful software design is all about trade-offs. In the typical (if there is such a thing) distributed system, recognizing the importance of trade-offs within the design of your architecture is integral to the success of your system. Despite this reality, I see time and time again, developers choosing a particular solution based on an ill-placed belief in their solution as a “silver bullet”, or a solution that conquers all, despite the inevitable occurrence of changing requirements. Regardless of the reasons behind this phenomenon, I’d like to outline a few of the methods I use to ensure that I’m making good scalable decisions without losing sight of the trade-offs that accompany them. I’d also like to compile (pun intended) the issues at hand, by formulating a simple theorem that we can use to describe this oft occurring situation.

Click to read more ...

Wednesday
Dec172008

Scalability Strategies Primer: Database Sharding

This article is a primer, intended to shine some much needed light on the logical, process oriented implementations of database scalability strategies in the form of a broad introduction. More specifically, the intent is to elaborate on the majority of these implementations by example.

Click to read more ...

Friday
Sep262008

Lucasfilm: The Real Magic is in the Data Center

Kevin Clark, director of IT operations for Lucasfilm, discusses how their data center works: * Linux-based platform, SUSE (looking to change), and a lot of proprietary open source applications for content creation. * 4,500-processor render farm in the datacenter. Workstations are used off hours. * Developed their own proprietary scheduler to schedule their 5,500 available processors. * Render nodes, the blade racks (from Verari), run dual-core dual Opteron chips with 32GB of memory on board, but are expanding those to quad-core. Are an AMD shop. * 400TB of storage online for production. * Every night they write out 10-20TB of new data on a render. A project will use up to a hundred-plus terabytes of storage. * Incremental backups are a challenge because the data changes up to 50 percent over a week. * NetApps used for storage. They like the global namespace in the virtual file system. * Foundry Networks architecture shop. One of the larger 10-GbE-backbone facilities on the West coast. 350-plus 10 GbE ports that used for distribution throughout the facility and the backend. * Grid computing used for over 4 years. * A 10-Gig dark fiber connection connects San Rafael and their home office. Enables them to co-render and co-storage between the two facilities. No difference in performance in terms of where they went to look for their data and their shots. * Artists get server class machines: HP 9400 workstations with dual-core dual Opteron processors and 16GB of memory. * Challenge now is to better segment storage to not continue to sink costs into high-cost disks. * VMware used to host a lot of development environments. Allows the quick turn up of testing as the tests can be allocated across VMs. * Provides PoE (power-over-ethernet) out from the switch to all of our Web farms. * Next push on the facilities side. How to be more efficient at airflow management and power utilization.

Click to read more ...

Wednesday
Sep242008

Building a Scalable Architecture for Web Apps

By Bhavin Turakhia CEO, Directi. Covers: * Why scalability is important. Viral marketing can result in instant success. With RSS/Ajax/SOA number of requests grow exponentially with user base. Goal is to build a web 2.0 app that can server millions of users with zero downtime. * Introduction to the variables. Scalability, performance, responsiveness, availability, downtime impact, cost, maintenance effort. * Introduction to the factors. Platform selection, hardware, application design, database architecture, deployment architecture, storage architecture, abuse prevention, monitoring mechanisms, etc. * Building our own scalable architecture in incremental steps: vertical scaling, vertical partitioning, horizontal scaling, horizontal partitioning, etc. First buy bigger. Then deploy each service on a separate node. Then increase the number of nudes and load balance. Deal with session management. Remove single points of failure. Use a shared nothing cluster. Choice of master-slave multi-master setup. Replication can be synchronous or asynchronous. * Platform selection considerations. Use a global redirector for serving multiple datacenters. Add object, session API, and page cache. Add reverse proxy. Think about CDNs, Grid computing. * Tips. Use a Horizontal DB architecture from the beginning. Loosely couple all modules. Use a REST interface for easier caching. Perform application sizing ongoingly to ensure optimal hardware utilization.

Click to read more ...

Tuesday
Sep232008

How to Scale with Ruby on Rails

By George Palmer of 3dogsbark.com. Covers: * How you start out: shared hosting, web server DB on same machine. Move two 2 machines. Minimal code changes. * Scaling the database. Add read slaves on their own machines. Then master-master setup. Still minimal code changes. * Scaling the web server. Load balance against multiple application servers. Application servers scale but the database doesn't. * User clusters. Partition and allocate users to their own dedicated cluster. Requires substantial code changes. * Caching. A large percentage of hits are read only. Use reverse proxy, memcached, and language specific cache. * Elastic architectures. Based on Amazon EC2. Start and stop instances on demand. For global applications keep a cache on each continent, assign users to clusters by location, maintain app servers on each continent, use transaction replication software if you must replicate your site globally.

Click to read more ...