High Scalability -

Kristi Anderson |

Permalink |

Clustered Storage System,

tagged

Database-as-a-Service,

Redis AWS,

redis in

AWS,

Blog,

Caching,

DevOps,

Java,

Redis,

Shard,

cache,

cloud,

cloud computing,

cloud storage,

cluster,

cluster storage system,

clusters,

data,

data management,

database replication,

database scalability,

horizontal scalability,

nodejs,

nodes,

nosql,

shards

Wednesday

Jul092014

Using SSD as a Foundation for New Generations of Flash Databases - Nati Shalom

Wednesday, July 9, 2014 at 9:00AM

“You just can't have it all” is a phrase that most of us are accustomed to hearing and that many still believe to be true when discussing the speed, scale and cost of processing data. To reach high speed data processing, it is necessary to utilize more memory resources which increases cost. This occurs because price increases as memory, on average, tends to be more expensive than commodity disk drive. The idea of data systems being unable to reliably provide you with both memory and fast access—not to mention at the right cost—has long been debated, though the idea of such limitations was cemented by computer scientist, Eric Brewer, who introduced us to the CAP theorem.

The CAP Theorem and Limitations for Distributed Computer Systems

database performance online website,

Nati Shalom |

1 Comment |

Permalink |

SSD,

database replication,

database scalability,

distribucted caching,

enterprise architecture,

space based architecture,

space based programming,

space-based architecture,

sql,

storage delivery

Monday

Dec162013

22 Recommendations for Building Effective High Traffic Web Software

Monday, December 16, 2013 at 8:54AM

This is a guest post by Ashwanth Fernando, Software Engineer from the trenches at large scale internet companies.

Inspired by the book "Effective Java" by Joshua Bloch, I wanted to share my holistic recommendations on building high traffic web software (i.e. web applications/services that serve high traffic loads). Some of these items may not be just about software design but also around surrounding areas such as the engineering organization, culture etc.

Two disclaimers up front:

1) This is my opinion.
2) There will be real world situations where the below principles will be wrong as in all things "software". Please use common sense all the time.

Consider using more than one datacenter

There have been numerous horror stories about businesses, ahem going out of business because they just had a single datacenter. Its really important to have more than one data center if you want to protect yourself from natural disasters or electrical supply failures. Run all your datacenters in active-active configuration. It may cost extra money, but its well worth it rather than having an active passive configuration and then finding out at the end that for some pieces of data, your passive hardware was not consistent with the active one.

Consider a sparse datacenter deployment

Ashwanth Fernando |

11 Comments |

Permalink |

tagged

Distributed Computing,

cloud deployment,

database,

datacenter,

reactor in

Scalability

Thursday

May032012

Snooze - Open-source, Scalable, Autonomic, and Energy-efficient VM Management for Private Clouds

Thursday, May 3, 2012 at 9:15AM

Snooze is an open-source, scalable, autonomic, and energy-efficient virtual machine (VM) management framework for private clouds. Similarly to other VM management frameworks such as Nimbus, OpenNebula, Eucalyptus, and OpenStack it allows to build compute infrastructures from virtualized resources. Particularly, once installed and configured users can submit and control the life-cycle of a large number of VMs. However, contrary to existing frameworks for scalability and fault tolerance, Snooze employs a self-organizing and healing (based on Apache ZooKeper) hierarchical architecture. Moreover, it performs distributed VM management and is designed to be energy efficient. Therefore, it implements features to monitor and estimate VM resource (CPU, memory, network Rx, network Tx) demands, detect and resolve overload/underload situations, perform dynamic VM consolidation through live migration, and finally power management to save energy. Last but not least, it integrates a generic scheduler which allows to implement any VM placement algorithms. The system can be either used to manage production data centers or as an experimental testbed for advanced (i.e. requiring live migration support) VM placement algorithms.

Eugen Feller |

Permalink |

scheduler

Thursday

Sep032009

Storage Systems for High Scalable Systems presentation

Thursday, September 3, 2009 at 9:18PM

The High Scalable Systems (i.e. Websites) such as: Google, Facebook, Amazon, etc. need high scalable storage system that can deal with huge amount of data with high availability and reliability. Building large systems on top of a traditional RDBMS data storage layer is no longer good enough. This presentation explores the landscape of new technologies available today to augment your data layer to improve performance and reliability.

Remember: All of my presentations contents is open source, please feel free to use it, copy it, and re-distribute it as you want.

Download the presentation

HFadeel |

1 Comment |

Permalink |

scaling storage

Monday

Jun222009

Improving performance and scalability with DDD

Monday, June 22, 2009 at 11:28PM

Distributed systems are not typically a place domain driven design is applied. Distributed processing projects often start with an overall architecture vision and the idea about a processing model which basically drives the whole thing, including object design if it exists at all. Elaborate object designs are thought of as something that just gets in the way of distribution and performance, so the idea of spending time to apply DDD principles gets rejected in favour of raw throughput and processing power. However, from my experience, some more advanced DDD concepts can significantly improve the performance, scalability and throughput of distributed systems when applied correctly.

This article a summary of the presentation titled "DDD in a distributed world" from the DDD Exchange 09 in London.

gojko |

Permalink |

ddd,

transactions

Tuesday

May192009

Scaling Memcached: 500,000+ Operations/Second with a Single-Socket UltraSPARC T2

Tuesday, May 19, 2009 at 5:34PM

A software-based distributed caching system such as memcached is an important piece of today's largest Internet sites that support millions of concurrent users and deliver user-friendly response times. The distributed nature of memcached design transforms 1000s of servers into one large caching pool with gigabytes of memory per node. This blog entry explores single-instance memcached scalability for a few usage patterns. Table below shows out-of-the-box (no custom OS rewrites or networking tuning required) performance with 10G networking hardware and one single-socket UltraSPARC T2-based server with 8 cores and 8 threads per core (64 threads on a chip)... Object Size / Ops/Sec / Bandwidth 100 bytes / 530,000 / 1.2 Gb/s 2048 bytes / 370,000 / 6.9 Gb/s 4096 bytes / 255,000 / 9.2 Gb/s Check out the link for more details!

geekr |

1 Comment |

Permalink |

Sun,

multithreading

Friday

May152009

Wolfram|Alpha Architecture

Friday, May 15, 2009 at 1:13AM

Making the world's knowledge computable

Today's Wolfram|Alpha is the first step in an ambitious, long-term project to make all systematic knowledge immediately computable by anyone. You enter your question or calculation, and Wolfram|Alpha uses its built-in algorithms and growing collection of data to compute the answer.

Answer Engine vs Search Engine

When Wolfram|Alpha launches later today, it will be one of the most computationally intensive websites on the internet. The Wolfram|Alpha computational knowledge engine is an "answer engine" that is able to produce answers to various questions such as

What is the GDP of France?
Weather is Springfield when David Ortiz was born
33 g of gold
LDL vs. serum potassium 150 smoker male age 40
life expectancy male age 40 finland
highschool teacher median wage

Wolfram|Alpha excels at different areas like mathematics, statistics, physics, engineering, astronomy, chemistry, life sciences, geology, business and finance as demonstrated by Steven Wolfram in his Introduction screencast.

The Stats

Abour 10,000 CPU cores at launch
10+ trillion of pieces of data
50,000+ types of algorithms
Able to handle about 175 million queries per day
5+ million lines of symbolic Mathematica code

The Computers Powering Computable Knowledge

There is no way to know exactly how much traffic to expect, especially during the initial period immediately following the launch, but the Wolfram|Alpha team is working hard to put reasonable capacity in place. As Stephen writes in the Wolfram|Alpha blog Alpha will run in 5 distributed colocation facilities. What computing power have they gathered in these facilities for launch day? Two supercomputers, just about 10,000 processor cores, hundreds of terabytes of disks, a heck of a lot of bandwidth, and what seems like enough air conditioning for the Sahara to host a ski resort. One of their launch partners, R Systems, created the world’s 44th largest supercomputer (per the June 2008 TOP500 list - it is listed as 66th per the latest Top500 list). They call it the R Smarr. It will be running Wolfram|Alpha on launch day! R Smarr has a Sum Rmax of 39580 GFlops using Dell DCS CS23-SH, QC HT 2.8 GHz computers, 4608 cores, 65536 GB of RAM and Infiniband interconnect. Dell is another of the launch partners with a data center full of quad-board, dual-processor, quad-core Harpertown servers. What does it all add up to? The ability to handle 175 million queries (yielding maybe a billion) per day—over 5 billion queries (encompassing around 30 billion calculations) per month.

The Launch of Wolfram|Alpha

Watch a live webcast of the Wolfram|Alpha system being brought online for the first time on

Friday, May 15, beginning at 7pm CST

The First Killer App of The New Kind of Science

The Genius behind Wolfram|Alpha is Stephen Wolfram. He is best know for his ambitious projects: Mathematica and A New Kind of Science (NKS). May 14, 2009 marks the 7th anniversary of the publication of his book A New Kind of Science. Stephen explains is his blog post: But for me the biggest thing that’s happened this year is the emergence of Wolfram|Alpha. Wolfram|Alpha is, I believe, going to be the first killer app of NKS.

Status

That it should be possible to build Wolfram|Alpha as it exists today in the first decade of the 21st century was far from obvious. And yet there is much more to come. As of now, Wolfram|Alpha contains 10+ trillion of pieces of data, 50,000+ types of algorithms and models, and linguistic capabilities for 1000+ domains. Built with Mathematica—which is itself the result of more than 20 years of development at Wolfram Research—Wolfram|Alpha's core code base now exceeds 5 million lines of symbolic Mathematica code. Running on supercomputer-class compute clusters, Wolfram|Alpha makes extensive use of the latest generation of web and parallel computing technologies, including webMathematica and gridMathematica.

How Mathematica Made Wolfram|Alpha Possible?

Wolfram|Alpha is a major software engineering development to make all systematic knowledge immediately computable by anyone. It is developed and deployed entirely with Mathematica—in fact, Mathematica has uniquely made Wolfram|Alpha possible. Here's why.

Computational knowledge and intelligence
High-performance enterprise deployment
One coherent architecture
Smart method selection
Dynamic report generation
Database connectivity
Built-in, computable data
High-level programming language
Efficient text processing and linguistic analysis
Wide-ranging, automated visualization capabilities
Automated importing
Development environment

Information Sources

The Secret behind the Computational Engine in Wolfram|Alpha
More info expected after the launch... stay tuned!

Congratulations Stephen!

geekr |

12 Comments |

Permalink |

Grid,

alpha,

wolfram

Thursday

May142009

Who Has the Most Web Servers?

Thursday, May 14, 2009 at 10:19PM

An interesting post on DataCenterKnowledge!

1&1 Internet: 55,000 servers
Rackspace: 50,038 servers
The Planet: 48,500 servers
Akamai Technologies: 48,000 servers
OVH: 40,000 servers
SBC Communications: 29,193 servers
Verizon: 25,788 servers
Time Warner Cable: 24,817 servers
SoftLayer: 21,000 servers
AT&T: 20,268 servers
iWeb: 10,000 servers
How about Google, Microsoft, Amazon, eBay, Yahoo, GoDaddy, Facebook? Check out the post on DataCenterKnowledge and of course here on highscalability.com!

geekr |

6 Comments |

Permalink |

Data Center,

Scalability

Tuesday

May122009

GemStone Unveils GemFire Enterprise 6.0

Tuesday, May 12, 2009 at 4:02AM

GemFire Enterprise is in-memory distributed data management platform that pools memory (and CPU, network and optionally local disk) across multiple processes to manage application objects and behavior. With the 6.0 release, GemFire has reached a stage of maturity in its evolution. GemStone touts this version as the true 'best of breed' distributed caching technology, solving scalability issues in all industries.

inquiry |

Permalink |