Entries in key-value store (19)

Thursday
Oct032019

Redis Cloud Gets Easier with Fully Managed Hosting on Azure

Redis Cloud Gets Easier with Fully Managed Hosting on Azure

ScaleGrid, a rapidly growing leader in the Database-as-a-Service (DBaaS) space, has just launched their new fully managed Redis on Azure service. This Redis management solution allows startups up to enterprise-level organizations automate their Redis operations on Microsoft Azure dedicated cloud servers, alongside their other open source database deployments, including MongoDBMySQL and PostgreSQL.

Redis, the #1 key-value store and top 10 database in the world, has grown by over 300% in popularity over that past 5 years, per the DB-Engines knowledge base. The demand for Redis is skyrocketing across dozens of use cases, particularly for cache, queues, geospatial data, and high speed transactions. This simple database management system makes it very easy to store and retrieve pairs of keys and values, and is commonly paired with other database types to increase the speed and performance of an application. According to the 2019 Open Source Database Report, a majority of Redis deployments are used in conjunction with MySQL, and over half of Redis deployments are used with either PostgreSQL, MongoDB, and Elasticsearch.

ScaleGrid’s Redis hosting service allows these organizations to automate all of their time-consuming management tasks, such as backups, upgrades, scaling, replication, sharding, monitoring, alerts, log rotations, and OS patching, so their DBAs, developers, and DevOps teams can focus on new product development and optimizing performance. Additionally, organizations can customize their Redis persistence and host through their own Azure account which allows them to leverage advanced cloud capabilities like Azure Virtual Networks (VNET), Security Groups, and Reserved Instances to reduce long-term hosting costs up to 60%. 

“Cloud reliability has never been so important,” says Dharshan Rangegowda, Founder and CEO of ScaleGrid. “It’s crucial for organizations to properly configure their Redis deployments for high availability and disaster recovery, as a couple minutes of downtime can be detrimental to a company’s security and reputation.”

ScaleGrid is the only Redis cloud service that allows you to customize your master-slave and cross-datacenter configurations for 100% uptime and availability across 30 different Azure regions. They also allow you to keep full Redis admin access and SSH access to your machines, and you can learn more about their advantages over competitors Compose for Redis, RedisGreen, Redis Labs and Elasticache for Redis on their Compare Redis Providers page.

Tuesday
Sep032019

Top Redis Use Cases by Core Data Structure Types

Top Redis Use Cases by Core Data Structure Types - ScaleGrid Blog

Redis, short for Remote Dictionary Server, is a BSD-licensed, open-source in-memory key-value data structure store written in C language by Salvatore Sanfillipo and was first released on May 10, 2009. Depending on how it is configured, Redis can act like a database, a cache or a message broker. It’s important to note that Redis is a NoSQL database system. This implies that unlike SQL (Structured Query Language) driven database systems like MySQL, PostgreSQL, and Oracle, Redis does not store data in well-defined database schemas which constitute tables, rows, and columns. Instead, Redis stores data in data structures which makes it very flexible to use. In this blog, we outline the top Redis use cases by the different core data structure types.

Data Structures in Redis

Click to read more ...

Thursday
Nov292012

Performance data for LevelDB, Berkley DB and BangDB for Random Operations

This is a guest post by Sachin Sinha, Founder of Iqlect and developer of BangDB.

The goal for the paper is to provide the performances data for following embedded databases under various scenarios for random operations such as write and read. The data is presented in graphical manner to make the data self explanatory to some extent.

  • LevelDB:

    LevelDB is a fast key-value storage library written at Google that provides an ordered mapping from string keys to string values. Leveldb is based on LSM (Log-Structured Merge-Tree) and uses SSTable and MemTable for the database implementation. It's written in C++ and availabe under BSD license. LevelDB treats key and value as arbitrary byte arrays and stores keys in ordered fashion. It uses snappy compression for the data compression. Write and Read are concurrent for the db, but write performs best with single thread whereas Read scales with number of cores

  • BerkleyDB:

    BerkleyDB (BDB) is a library that provides high performance embedded database for key/value data. Its the most widely used database library with millions of deployed copies. BDB can be configured to run from concurrent data store to transactional data store to fully ACID compliant db. It's written in C and availabe under Sleepycat Public License. BDB treats key and value as arbitrary byte arrays and stores keys in both ordered fashion using BTREE and un-ordered way using HASH. Write and Read are concurrent for the db, and scales well with number of cores especially the Read operation

  • BangDB:

    BangDB is a high performance embedded database for key value data. It's a new entrant into the embedded db space. It's written in C++ and available under BSD license. BangDB treats key and value as arbitrary byte arrays and stores keys in both ordered fashion using BTREE and un-ordered way using HASH. Write, Read are concurrent and scales well with the number of cores

The comparison has been done on the similar grounds (as much as possible) for all the dbs to measure the data as crisply and accurately as possible.

The results of the test show BangDB faster in both reads and writes:

Click to read more ...

Thursday
Jan272011

Comet - An Example of the New Key-Code Databases

Comet is an active distributed key-value store built at the University of Washington. The paper describing Comet is Comet: An active distributed key-value store, there are also slides, and a MP3 of a presentation given at OSDI '10. Here's a succinct overview of Comet:

Today's cloud storage services, such as Amazon S3 or peer-to-peer DHTs, are highly inflexible and impose a variety of constraints on their clients: specific replication and consistency schemes, fixed data timeouts, limited logging, etc. We witnessed such inflexibility first-hand as part of our Vanish work, where we used a DHT to store encryption keys temporarily. To address this issue, we built Comet, an extensible storage service that allows clients to inject snippets of code that control their data's behavior inside the storage service.

I found this paper quite interesting because it takes the initial steps of collocating code with a key-value store, which turns it into what might called a key-code store. This is something I've been exploring as a way of moving behavior to data in order to overcome network limitations in the cloud and provide other benefits. An innovator in this area is the Alchemy Database, which has already combined Redis and Lua. A good platform for this sort of thing might be Node.js integrated with V8. This would allow complex Javascript programs to run in an efficient evented container. There are a lot of implications of this sort of architecture, more about that later, but the Comet paper describes a very interesting start.

From the abstract and conclusion:

Click to read more ...

Monday
Aug302010

Pomegranate - Storing Billions and Billions of Tiny Little Files

Pomegranate is a novel distributed file system built over distributed tabular storage that acts an awful lot like a NoSQL system. It's targeted at increasing the performance of tiny object access in order to support applications like online photo and micro-blog services, which require high concurrency, high throughput, and low latency. Their tests seem to indicate it works:

We have demonstrate that file system over tabular storage performs well for highly concurrent access. In our test cluster, we observed linearly increased more than 100,000 aggregate read and write requests served per second (RPS). 

Rather than sitting atop the file system like almost every other K-V store, Pomegranate is baked into file system. The idea is that the file system API is common to every platform so it wouldn't require a separate API to use. Every application could use it out of the box.

The features of Pomegranate are:

  • It handles billions of small files efficiently, even in one directory;
  • It provide separate and scalable caching layer, which can be snapshot-able;
  • The storage layer uses log structured store to absorb small file writes to utilize the disk bandwidth;
  • Build a global namespace for both small files and large files;
  • Columnar storage to exploit temporal and spatial locality;
  • Distributed extendible hash to index metadata;
  • Snapshot-able and reconfigurable caching to increase parallelism and tolerant failures;
  • Pomegranate should be the first file system that is built over tabular storage, and the building experience should be worthy for file system community. 

Can Ma, who leads the research on Pomegranate, was kind enough to agree to a short interview.

Click to read more ...

Thursday
Nov052009

A Yes for a NoSQL Taxonomy

NorthScale's Steven Yen in his highly entertaining NoSQL is a Horseless Carriage presentation has come up with a NoSQL taxonomy that thankfully focuses a little more on what NoSQL is, than what it isn't:

  • key‐value‐cache
    • memcached, repcached, coherence, infinispan, eXtreme scale, jboss cache, velocity, terracoqa
  •  key‐value‐store
    • keyspace, flare, schema‐free, RAMCloud
  • eventually‐consistent key‐value‐store
    • dynamo, voldemort, Dynomite, SubRecord, Mo8onDb, Dovetaildb
  • ordered‐key‐value‐store
    • tokyo tyrant, lightcloud, NMDB, luxio, memcachedb, actord
  • data‐structures server
    •  redis
  • tuple‐store
    • gigaspaces, coord, apache river
  • object database
    • ZopeDB, db4o, Shoal
  • document store
    •  CouchDB, Mongo, Jackrabbit, XML Databases, ThruDB, CloudKit, Perservere, Riak Basho, Scalaris
  • wide columnar store
    • BigTable, Hbase, Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI

"Who will win?" Steven asks. He answers:  the most approachable API with enough power will win. Steven touts the contender with the most devastating knock out punch will be document stores because "everyone groks documents." Though the thought is there will be just a few winners and products will converge in functionality.

Steven is banking on the "worse is better" model of dominance, which is hard to argue with as it has been so successful an adoption pattern in our field. The convergence idea is something I also agree with. What we have now are a lot features masquerading as products. Over time they will merge together to become more full featured offerings.

The key question though is what is enough power to win? Just getting a value back for a key won't be enough. Who are you putting your money on?

Click to read more ...

Thursday
Oct292009

Paper: No Relation: The Mixed Blessings of Non-Relational Databases

This excellent survey of the field was written by Ian Thomas Varley as part of his Master of Science in Engineering program.

The aim of this paper is to explore the conceptual design space of non-relational databases as compared to traditional relational databases. It is clear that the design needs of the two paradigms are different, but how fundamental are the differences, and what strategies can we use to transition our conceptual designs from one to the other?
There are a few things to like about this paper. A running a example is used to show the different ways to model data depending on which type of solution you are targeting, especially covering how many-to-many relationships are modeled, data integrity, and how to support optional attributes. There's also a brief survey of some of the major systems.
The most interesting section of the report is where it tackles the problem of design for non-relational systems. The approach has two different phases: design questions and design strategies.
The questions you should ask yourself about your problem are:

Click to read more ...

Thursday
Oct082009

Riak - web-shaped data storage system

Update: Short presentation NYC by Bryan Fink  demonstrating the riak web-shaped data storage engine

Riak is another new and interesting key-value store entrant. Some of the features it offers are:

  • Document-oriented
  • Scalable, decentralized key-value store
  • Standard getput, and delete operations. 
  • Distributed, fault-tolerant storage solution.
  • Configurable levels of consistency, availability, and partition tolerance
  • Support for Erlang, Ruby, PHP, Javascript, Java, Python, HTTP
  •  open source and NoSQL
  • Pluggable backends
  • Eventing system
  • Monitoring
  • Inter-cluster replication
  • Links between records that can be traversed.
  • Map/Reduce. Functions are executed on the data node. One interesting difference is that a list keys are required to specify which values are operated on as apposed to running calculations on all values. 

Related Articles

  • Hacker News Thread. More juicy details on how Riak compares to Cassandra, mongodb, couchdb, etc. 

 

Thursday
Sep032009

Storage Systems for High Scalable Systems presentation

The High Scalable Systems (i.e. Websites) such as: Google, Facebook, Amazon, etc. need high scalable storage system that can deal with huge amount of data with high availability and reliability. Building large systems on top of a traditional RDBMS data storage layer is no longer good enough. This presentation explores the landscape of new technologies available today to augment your data layer to improve performance and reliability.

Remember: All of my presentations contents is open source, please feel free to use it, copy it, and re-distribute it as you want.

Download the presentation

Saturday
Aug082009

Yahoo!'s PNUTS Database: Too Hot, Too Cold or Just Right?

So far every massively scalable database is a bundle of compromises. For some the weak guarantees of Amazon's eventual consistency model are too cold. For many the strong guarantees of standard RDBMS distributed transactions are too hot. Google App Engine tries to get it just right with entity groups. Yahoo! is also trying to get is just right by offering per-record timeline consistency, which hopes to serve up a heaping bowl of rich database functionality and low latency at massive scale:

We describe PNUTS [Platform for Nimble Universal Table Storage], a massively parallel and geographically distributed database system for Yahoo!’s web applications. PNUTS provides data storage organized as hashed or ordered tables, low latency for large numbers of con-current requests including updates and queries, and novel per-record consistency guarantees. It is a hosted, centrally managed, and geographically distributed service, and utilizes automated load-balancing and failover to reduce operational complexity. The first version of the system is currently serving in production. We describe the motivation for PNUTS and the design and implementation of its table storage and replication layers, and then present experimental results.

Some of the cool things about PNUTS are:

  • They actually talk about the hard problem of how to scale a system to 10 different data centers (each with 1,000+ servers) while supporting secondary indexes, materialized views, the ability to create multiple tables, and hash-organized tables. Multi-datacenter operation is so difficult it's usually ignored. PNUTS is designed specifically to operate in many datacenters with a strongish consistency model, which makes it a very interesting design point.
  • You can subscribe to a reliable ordered stream of updates on a table. This is massively convenient. For many applications numerous processes are tied to data changes and this is normally a pain to implement.
  • The consistency model is a per-record timeline consistency: all replicas of a given record apply all updates to the record in the same order. This provides a consistency model that is between the two extremes of serialized transactions and eventual consistency. Conflicting records can't exist at the same time as is allowed by Dynamo.
  • Supports records, but accepts queries only on individual tables. There's no fixed schema for records and columns can be typed or be blobs. Transactions exist only at the record level. This means like with other NoSQL databases denormalization is the modeling strategy as there's no way to have transactions across tables.
  • The degree of read consistency desired can be specified. Records are versioned. You can ask for the latest record version or allow for potentially stale records.
  • Asynchronous replication is used to ensure low write latency while providing geographic replication. Reads should be fast everywhere and may return older versions, writes should be fast locally (in the same datacenter). [3]
  • A message broker that serves both as the replication mechanism and redo log of the database. The message broker guarantees no replica can receive updates out of order because it provided a reliable, totally ordered message channel. This also means you can't have transactions across tables, consistent joins, or foreign keys. They chose this approach over a gossip mechanism (like Dynamo) because it can be optimized for geographically distant replicas and because replicas do not need to know the location of other replicas.
  • PNUTS is hosted and centrally managed. It was built to reduce the overhead of creating and maintaining new applications rather than every property creating their own system. Failover, adding capacity, resharding, performance isolation and supporting applications with different usage profiles are all completely automated.
  • Predicate queries are supported using a scatter-gather mechanism which sends the query to every relevant storage tablet at once. The are gathered sent back to the client.
  • Performance is 1-10ms/request when caching layers are in place.

    From a system perspective PNUTS offers a lot of the good things: hosted, reliability, lowish latency, automation, scalability, supports many application models, and there's a lot of room to improvement that all applications will be able to take advantage of when available.

    From a programmer perspective the are also a lot of good things: it's hosted so fewer worries, notifications, flexible schemas, ordered records, secondary indexes, lowish latency, strong consistency on a single record, scalability, high write rates, reliability, and range queries over a small set of records.

    Unfortunately Goldilocks still needs to keep searching for just right, though she may be getting closer. From a system perspective Yahoo!'s ideas are good, but they don't help you as the system isn't available for you to use. From a programmer perspective the programmer's job is still way too hard. To be just right programmer's need low latency aggregate operators, complex transactions, scalable counters, automatic relationship management, and all the other features that will help them just buy instant porridge and be done with it.

    Related Articles

  • Anti-RDBMS: A list of distributed key-value stores
  • Details on Yahoo's distributed database by Greg Linden
  • Thoughts on Yahoo's PNUTS distributed database by Marton Trencseni
  • Data Challenges at Yahoo! - Ricardo Baeza-Yates & Raghu Ramakrishnan
    Yahoo! Research
  • PNUTS: Yahoo!'s Hosted Data Serving Platform by Lucian
  • Yahoo’s PNUTS by Henry Robinson. A very thoughtful and informative overview of the paper.
  • How robust are gossip-based communication protocols? by Lorenzo Alvisi et al.
  • Trading consistency for scalability in distributed architectures.
  • Asynchronous View Maintenance for VLSD Databases by Parag Agrawal et al.
  • BigTable
  • Dynamo
  • Are Cloud Based Memory Architectures the Next Big Thing?
  • The Story of Goldilocks and the Three Bears
  • PNUTS - Platform for Nimble Universal Table Storage