Entries in Scalability (52)

Tuesday
Feb192019

Intro to Redis Cluster Sharding – Advantages, Limitations, Deploying & Client Connections

Intro to Redis Cluster Sharding – Advantages, Limitations, Deploying & Client Connections

Redis Cluster is the native sharding implementation available within Redis that allows you to automatically distribute your data across multiple nodes without having to rely on external tools and utilities. At ScaleGrid, we recently added support for Redis Clusters on our platform through our fully managed Redis hosting plans. In this post, we’re going to introduce you to the advanced Redis Cluster sharding opportunities, discuss its advantages and limitations, when you should deploy, and how to connect to your Redis Cluster.

Sharding with Redis Cluster

Click to read more ...

Wednesday
Jul092014

Using SSD as a Foundation for New Generations of Flash Databases - Nati Shalom

“You just can't have it all” is a phrase that most of us are accustomed to hearing and that many still believe to be true when discussing the speed, scale and cost of processing data. To reach high speed data processing, it is necessary to utilize more memory resources which increases cost. This occurs because price increases as memory, on average, tends to be more expensive than commodity disk drive. The idea of data systems being unable to reliably provide you with both memory and fast access—not to mention at the right cost—has long been debated, though the idea of such limitations was cemented by computer scientist, Eric Brewer, who introduced us to the CAP theorem.

The CAP Theorem and Limitations for Distributed Computer Systems

Click to read more ...

Monday
Dec162013

22 Recommendations for Building Effective High Traffic Web Software

This is a guest post by Ashwanth Fernando, Software Engineer from the trenches at large scale internet companies.

Inspired by the book "Effective Java" by Joshua Bloch, I wanted to share my holistic recommendations on building high traffic web software (i.e. web applications/services that serve high traffic loads). Some of these items may not be just about software design but also around surrounding areas such as the engineering organization, culture etc.

Two disclaimers up front:

1) This is my opinion.
2) There will be real world situations where the below principles will be wrong as in all things "software". Please use common sense all the time.


Consider using more than one datacenter

There have been numerous horror stories about businesses, ahem going out of business because they just had a single datacenter. Its really important to have more than one data center if you want to protect yourself from natural disasters or electrical supply failures. Run all your datacenters in active-active configuration. It may cost extra money, but its well worth it rather than having an active passive configuration and then finding out at the end that for some pieces of data, your passive hardware was not consistent with the active one.

Consider a sparse datacenter deployment

Click to read more ...

Thursday
May032012

Snooze - Open-source, Scalable, Autonomic, and Energy-efficient VM Management for Private Clouds

Snooze is an open-source, scalable, autonomic, and energy-efficient virtual machine (VM) management framework for private clouds. Similarly to other VM management frameworks such as Nimbus, OpenNebula, Eucalyptus, and OpenStack it allows to build compute infrastructures from virtualized resources. Particularly, once installed and configured users can submit and control the life-cycle of a large number of VMs. However, contrary to existing frameworks for scalability and fault tolerance, Snooze employs a self-organizing and healing (based on Apache ZooKeper) hierarchical architecture. Moreover, it performs distributed VM management and is designed to be energy efficient. Therefore, it implements features to monitor and estimate VM resource (CPU, memory, network Rx, network Tx) demands, detect and resolve overload/underload situations, perform dynamic VM consolidation through live migration, and finally power management to save energy. Last but not least, it integrates a generic scheduler which allows to implement any VM placement algorithms. The system can be either used to manage production data centers or as an experimental testbed for advanced (i.e. requiring live migration support) VM placement algorithms.

Click to read more ...

Thursday
Sep032009

Storage Systems for High Scalable Systems presentation

The High Scalable Systems (i.e. Websites) such as: Google, Facebook, Amazon, etc. need high scalable storage system that can deal with huge amount of data with high availability and reliability. Building large systems on top of a traditional RDBMS data storage layer is no longer good enough. This presentation explores the landscape of new technologies available today to augment your data layer to improve performance and reliability.

Remember: All of my presentations contents is open source, please feel free to use it, copy it, and re-distribute it as you want.

Download the presentation

Monday
Jun222009

Improving performance and scalability with DDD

Distributed systems are not typically a place domain driven design is applied. Distributed processing projects often start with an overall architecture vision and the idea about a processing model which basically drives the whole thing, including object design if it exists at all. Elaborate object designs are thought of as something that just gets in the way of distribution and performance, so the idea of spending time to apply DDD principles gets rejected in favour of raw throughput and processing power. However, from my experience, some more advanced DDD concepts can significantly improve the performance, scalability and throughput of distributed systems when applied correctly.

This article a summary of the presentation titled "DDD in a distributed world" from the DDD Exchange 09 in London.

Tuesday
May192009

Scaling Memcached: 500,000+ Operations/Second with a Single-Socket UltraSPARC T2

A software-based distributed caching system such as memcached is an important piece of today's largest Internet sites that support millions of concurrent users and deliver user-friendly response times. The distributed nature of memcached design transforms 1000s of servers into one large caching pool with gigabytes of memory per node. This blog entry explores single-instance memcached scalability for a few usage patterns. Table below shows out-of-the-box (no custom OS rewrites or networking tuning required) performance with 10G networking hardware and one single-socket UltraSPARC T2-based server with 8 cores and 8 threads per core (64 threads on a chip)... Object Size / Ops/Sec / Bandwidth 100 bytes / 530,000 / 1.2 Gb/s 2048 bytes / 370,000 / 6.9 Gb/s 4096 bytes / 255,000 / 9.2 Gb/s Check out the link for more details!

Click to read more ...

Friday
May152009

Wolfram|Alpha Architecture

Making the world's knowledge computable

Today's Wolfram|Alpha is the first step in an ambitious, long-term project to make all systematic knowledge immediately computable by anyone. You enter your question or calculation, and Wolfram|Alpha uses its built-in algorithms and growing collection of data to compute the answer.

Answer Engine vs Search Engine

When Wolfram|Alpha launches later today, it will be one of the most computationally intensive websites on the internet. The Wolfram|Alpha computational knowledge engine is an "answer engine" that is able to produce answers to various questions such as
  • What is the GDP of France?
  • Weather is Springfield when David Ortiz was born
  • 33 g of gold
  • LDL vs. serum potassium 150 smoker male age 40
  • life expectancy male age 40 finland
  • highschool teacher median wage
Wolfram|Alpha excels at different areas like mathematics, statistics, physics, engineering, astronomy, chemistry, life sciences, geology, business and finance as demonstrated by Steven Wolfram in his Introduction screencast.

The Stats

  • Abour 10,000 CPU cores at launch
  • 10+ trillion of pieces of data
  • 50,000+ types of algorithms
  • Able to handle about 175 million queries per day
  • 5+ million lines of symbolic Mathematica code

The Computers Powering Computable Knowledge

There is no way to know exactly how much traffic to expect, especially during the initial period immediately following the launch, but the Wolfram|Alpha team is working hard to put reasonable capacity in place. As Stephen writes in the Wolfram|Alpha blog Alpha will run in 5 distributed colocation facilities. What computing power have they gathered in these facilities for launch day? Two supercomputers, just about 10,000 processor cores, hundreds of terabytes of disks, a heck of a lot of bandwidth, and what seems like enough air conditioning for the Sahara to host a ski resort. One of their launch partners, R Systems, created the world’s 44th largest supercomputer (per the June 2008 TOP500 list - it is listed as 66th per the latest Top500 list). They call it the R Smarr. It will be running Wolfram|Alpha on launch day! R Smarr has a Sum Rmax of 39580 GFlops using Dell DCS CS23-SH, QC HT 2.8 GHz computers, 4608 cores, 65536 GB of RAM and Infiniband interconnect. Dell is another of the launch partners with a data center full of quad-board, dual-processor, quad-core Harpertown servers. What does it all add up to? The ability to handle 175 million queries (yielding maybe a billion) per day—over 5 billion queries (encompassing around 30 billion calculations) per month.

The Launch of Wolfram|Alpha

Watch a live webcast of the Wolfram|Alpha system being brought online for the first time on
  • Friday, May 15, beginning at 7pm CST

The First Killer App of The New Kind of Science

The Genius behind Wolfram|Alpha is Stephen Wolfram. He is best know for his ambitious projects: Mathematica and A New Kind of Science (NKS). May 14, 2009 marks the 7th anniversary of the publication of his book A New Kind of Science. Stephen explains is his blog post: But for me the biggest thing that’s happened this year is the emergence of Wolfram|Alpha. Wolfram|Alpha is, I believe, going to be the first killer app of NKS.

Status

That it should be possible to build Wolfram|Alpha as it exists today in the first decade of the 21st century was far from obvious. And yet there is much more to come. As of now, Wolfram|Alpha contains 10+ trillion of pieces of data, 50,000+ types of algorithms and models, and linguistic capabilities for 1000+ domains. Built with Mathematica—which is itself the result of more than 20 years of development at Wolfram Research—Wolfram|Alpha's core code base now exceeds 5 million lines of symbolic Mathematica code. Running on supercomputer-class compute clusters, Wolfram|Alpha makes extensive use of the latest generation of web and parallel computing technologies, including webMathematica and gridMathematica.

How Mathematica Made Wolfram|Alpha Possible?

Wolfram|Alpha is a major software engineering development to make all systematic knowledge immediately computable by anyone. It is developed and deployed entirely with Mathematica—in fact, Mathematica has uniquely made Wolfram|Alpha possible. Here's why.
  • Computational knowledge and intelligence
  • High-performance enterprise deployment
  • One coherent architecture
  • Smart method selection
  • Dynamic report generation
  • Database connectivity
  • Built-in, computable data
  • High-level programming language
  • Efficient text processing and linguistic analysis
  • Wide-ranging, automated visualization capabilities
  • Automated importing
  • Development environment

Information Sources

Congratulations Stephen!

Click to read more ...

Thursday
May142009

Who Has the Most Web Servers?

An interesting post on DataCenterKnowledge!

  • 1&1 Internet: 55,000 servers
  • Rackspace: 50,038 servers
  • The Planet: 48,500 servers
  • Akamai Technologies: 48,000 servers
  • OVH: 40,000 servers
  • SBC Communications: 29,193 servers
  • Verizon: 25,788 servers
  • Time Warner Cable: 24,817 servers
  • SoftLayer: 21,000 servers
  • AT&T: 20,268 servers
  • iWeb: 10,000 servers
  • How about Google, Microsoft, Amazon, eBay, Yahoo, GoDaddy, Facebook? Check out the post on DataCenterKnowledge and of course here on highscalability.com!

 

Tuesday
May122009

GemStone Unveils GemFire Enterprise 6.0

GemFire Enterprise is in-memory distributed data management platform that pools memory (and CPU, network and optionally local disk) across multiple processes to manage application objects and behavior. With the 6.0 release, GemFire has reached a stage of maturity in its evolution. GemStone touts this version as the true 'best of breed' distributed caching technology, solving scalability issues in all industries.

Click to read more ...