Entries in distributed caching (5)

Tuesday
Feb192019

Intro to Redis Cluster Sharding – Advantages, Limitations, Deploying & Client Connections

Intro to Redis Cluster Sharding – Advantages, Limitations, Deploying & Client Connections

Redis Cluster is the native sharding implementation available within Redis that allows you to automatically distribute your data across multiple nodes without having to rely on external tools and utilities. At ScaleGrid, we recently added support for Redis Clusters on our platform through our fully managed Redis hosting plans. In this post, we’re going to introduce you to the advanced Redis Cluster sharding opportunities, discuss its advantages and limitations, when you should deploy, and how to connect to your Redis Cluster.

Sharding with Redis Cluster

Click to read more ...

Wednesday
Jul092014

Using SSD as a Foundation for New Generations of Flash Databases - Nati Shalom

“You just can't have it all” is a phrase that most of us are accustomed to hearing and that many still believe to be true when discussing the speed, scale and cost of processing data. To reach high speed data processing, it is necessary to utilize more memory resources which increases cost. This occurs because price increases as memory, on average, tends to be more expensive than commodity disk drive. The idea of data systems being unable to reliably provide you with both memory and fast access—not to mention at the right cost—has long been debated, though the idea of such limitations was cemented by computer scientist, Eric Brewer, who introduced us to the CAP theorem.

The CAP Theorem and Limitations for Distributed Computer Systems

Click to read more ...

Monday
May032010

100 Node Hazelcast cluster on Amazon EC2

Deploying, running and monitoring application on a big cluster is a challenging task. Recently Hazelcast team deployed a demo application on Amazon EC2 platform to show how Hazelcast p2p cluster scales and screen recorded the entire process from deployment to monitoring.

Hazelcast is open source (Apache License), transactional, distributed caching solution for Java. It is a little more than a cache though as it provides distributed implementation of map, multimap, queue, topic, lock and executor service. 

Details of running 100 node Hazelcast cluster on Amazon EC2 can be found here. Make sure to watch the screencast!

Friday
Jun192009

GemFire 6.0: New innovations in data management

GemStone has unveiled GemFire 6.0 which is the culmination of several years of development and the continuous solving of the hardest data management problems in the world. With this release GemFire touts some of the latest innovative features in data management.

In this release:

- GemFire introduces a resource manager to continuously monitor and protect cache instances from running out of memory, triggering rebalancing to migrate data to less loaded nodes or allow dynamic increase/decrease in the number of nodes hosting data for linear scalability without impeding ongoing operations (no contention points).

- GemFire provides explicit control over when rebalancing can be triggered, on what class of data and even allows the administrator to simulate a "rebalance" operation to quantify the benefits before actually doing it.

- With built in instrumentation that captures throughput and latency metrics, GemFire now enables applications to sense changing performance patterns and proactively provision extra resources and trigger rebalancing. The end result is predictable data access throughput and latency without the need to overprovision capacity.

- We continue down the path of making the product more resilient than ever before - dealing with complex membership issues when operating in large clusters and allowing thresholds to be set in terms of consumption of memory in any server JVM that significantly reduces the probability of "stop the world" garbage collection cycles.

- Advanced Data Partitioning: Applications are no longer restricted by the memory available across the cluster to manage partitioned data. Applications can pool available memory as well as disk and stripe the data across memory and disk across the cluster. When the data fabric is configured as a cache, partitioned data can be expired or evicted so that only the most frequently used data is managed.

- Data-aware application behavior routing: There are several extensions added to the GemFire data-aware function execution service - a simple grid programming model that allows the application to synchronously or asynchronously execute application behavior on the data nodes. Applications invoke functions hinting the data they are dependent on and the service parallelizes the execution of the application function on all the grid nodes where the data is being managed. Applications can now define relationships between different classes of data to colocate all related data sets and application functions when routed to the data nodes can execute complex queries on in-process data. These and other features offered in the 'Function execution service' offers linear scalability for compute and data intensive applications. Simply add more nodes when demand spikes to rebalance data and behavior to increase the overall throughput for your application.

- API additions for C++, C#: Support for continuous querying, client side connection pooling and dynamic load balancing and ability to invoke server side functions.

- Cost based Query optimization: A new compact index to conserve memory utilizaton and enhanced query processor design with cost-based optimization has been introduced as part of this release.

- Developer productivity tools: It can be daunting when developers have to quickly develop and test their clustered application. Developers need the capability to browse the distributed data using ad-hoc queries, apply corrections or monitor resource utilization and performance metrics. A new graphical Data browser permits browsing and editing of data across the entire cluster, execution of ad-hoc queries and even create real-time table views that are continuously kept up-to-date through continuous queries. The GemFire Monitor tool (GFMon) also has several enhancements making the tool much more developer friendly.

For more information on GemFire, view our newly rewritten technical white paper at:
http://community.gemstone.com/download/attachments/4752318/GemFire+Data+Fabric+-+Technical+White+paper.pdf?version=1

Monday
Apr272009

Some Questions from a newbie

Hello highscalability world. I just discovered this site yesterday in a search for a scalability resource and was very pleased to find such useful information. I have some questions regarding distributed caching that I was hoping the scalability intelligentsia trafficking this forum could answer. I apologize for my lack of technical knowledge; I'm hoping this site will increase said knowledge! Feel free to answer all or as much as you want. Thank you in advance for your responses and thank you for a great resource! 1.) What are the standard benchmarks used to measure the performance of memcached or mySQL/memcached working together (from web 2.0 companies etc)? 2.) The little research I've conducted on this site suggests that most web 2.0 companies use a combination of mySQL and a hacked memcached (and potentially sharding). Does anyone know if any of these companies use an enterprise vendor for their distributed caching layer? (At this point in time I've only heard of Jive software using Coherence). 3.) In terms of a web 2.0 oriented startup, what are the database/distributed caching requirements typically needed to get off the ground and grow at a fairly rapid pace? 4.) Given the major players in the web 2.0 industry (facebook, twitter, myspace, PoF, Flickr etc, I'm ignoring google/amazon here because they have a proprietary caching layer) what is the most common, scalable back-end setup (mySQL/memcached/sharding etc)? What are its limitations/problems? What features does said setup lack that it really needs? Thank you so much for your insight!

Click to read more ...