Entries in C++ (4)

Tuesday
Sep032019

Top Redis Use Cases by Core Data Structure Types

Top Redis Use Cases by Core Data Structure Types - ScaleGrid Blog

Redis, short for Remote Dictionary Server, is a BSD-licensed, open-source in-memory key-value data structure store written in C language by Salvatore Sanfillipo and was first released on May 10, 2009. Depending on how it is configured, Redis can act like a database, a cache or a message broker. It’s important to note that Redis is a NoSQL database system. This implies that unlike SQL (Structured Query Language) driven database systems like MySQL, PostgreSQL, and Oracle, Redis does not store data in well-defined database schemas which constitute tables, rows, and columns. Instead, Redis stores data in data structures which makes it very flexible to use. In this blog, we outline the top Redis use cases by the different core data structure types.

Data Structures in Redis

Click to read more ...

Wednesday
Aug132014

Hamsterdb: An Analytical Embedded Key-value Store

 

In this post, I’d like to introduce you to hamsterdb, an Apache 2-licensed, embedded analytical key-value database library similar to Google's leveldb and Oracle's BerkeleyDB.

hamsterdb is not a new contender in this niche. In fact, hamsterdb has been around for over 9 years. In this time, it has dramatically grown, and the focus has shifted from a pure key-value store to an analytical database offering functionality similar to a column store database. 

hamsterdb is single-threaded and non-distributed, and users usually link it directly into their applications. hamsterdb offers a unique (at least, as far as I know) implementation of Transactions, as well as other unique features similar to column store databases, making it a natural fit for analytical workloads. It can be used natively from C/C++ and has bindings for Erlang, Python, Java, .NET, and even Ada. It is used in embedded devices and on-premise applications with millions of deployments, as well as serving in cloud instances for caching and indexing.

hamsterdb has a unique feature in the key-value niche: it understands schema information. While most databases do not know or care what kind of keys are inserted, hamsterdb supports key types for binary keys...

Click to read more ...

Thursday
May282009

Scaling PostgreSQL using CUDA

Combining GPU power with PostgreSQL PostgreSQL is one of the world's leading Open Source databases and it provides enormous flexibility as well as extensibility. One of the key features of PostgreSQL is that users can define their own procedures and functions in basically any known programming language. With the means of functions it is possible to write basically any server side codes easily. Now, all this extensibility is basically not new. What does it all have to do with scaling and then? Well, imagine a world where the data in your database and enormous computing power are tightly integrated. Imagine a world where data inside your database has direct access to hundreds of FPUs. Welcome to the world of CUDA, NVIDIA's way of making the power of graphics cards available to normal, high-performance applications. When it comes to complex computations databases might very well turn out to be a bottleneck. Depending on your application it might easily happen that adding more CPU power does not improve the overall performance of your system – the reason for that is simply that bringing data from your database to those units which actually do the computations is ways too slow (maybe because of remote calls and so on). Especially when data is flowing over a network, copying a lot of data might be limited by network latency or simply bandwidth. What if this bottleneck could be avoided? CUDA is C / C++ Basically a CUDA program is simple a C program with some small extensions. The CUDA subsystem transforms your CUDA program to normal C code which can then be compiled and linked nicely with existing code. This also means that CUDA code can basically be used to work inside a PostgreSQL stored procedure easily. The advantages of this mechanism are obvious: GPUs can do matrix and FPU related operations hundreds of times faster than any CPU the GPU is used inside the database and thus no data has to be transported over slow lines basically any NVIDIA graphics card can be used you get enormous computing power for virtually zero cost you can even build functional indexes on top of CUDA stored procedures not so many boxes are needed because one box is ways faster How to make it work? How to make this all work now? The goal for this simplistic example is to generate a set of random number on the CPU, copy it to the GPU and make the code callable from PostgreSQL. Here is the function to generate random numbers and to copy them to the GPU: /* implement random generator and copy to CUDA */ nn_precision* generate_random_numbers(int number_of_values) { nn_precision *cuda_float_p; /* allocate host memory and CUDA memory */ nn_precision *host_p = (nn_precision *)pg_palloc(sizeof(nn_precision) * number_of_values); CUDATOOLS_SAFE_CALL( cudaMalloc( (void**) &cuda_float_p, sizeof(nn_precision) * number_of_values)); /* create random numbers */ for (int i = 0; i < number_of_values; i++) { host_p[i] = (nn_precision) drand48(); } /* copy data to CUDA and return pointer to CUDA structure */ CUDATOOLS_SAFE_CALL( cudaMemcpy(cuda_float_p, host_p, sizeof(nn_precision) * number_of_values, cudaMemcpyHostToDevice) ); return cuda_float_p; } Now we can go and call this function from a PostgreSQL stored procedure: /* import postgres internal stuff */ #include "postgres.h" #include "fmgr.h" #include "funcapi.h" #include "utils/memutils.h" #include "utils/elog.h" #include "cuda_tools.h" PG_MODULE_MAGIC; /* prototypes to silence compiler */ extern Datum test_random(PG_FUNCTION_ARGS); /* define function to allocate N random values (0 - 1.0) and put it into the CUDA device */ PG_FUNCTION_INFO_V1(test_random); Datum test_random(PG_FUNCTION_ARGS) { int number = PG_GETARG_INT32(0); nn_precision *p = generate_random_numbers(number); cuda_free_array(p); PG_RETURN_VOID(); } This code then now be nicely compiled just like any other PostgreSQL C extension. The test random function can be called just like this: SELECT test_random(1000); Of course this is a just brief introduction to see how things can practically be done. A more realistic application will need more thinking and can be integrated into the database even more closely. More information: Professional CUDA programming Professional PostgreSQL services The official PostgreSQL Website The official CUDA site

Click to read more ...

Sunday
Feb242008

Yandex Architecture

Update: Anatomy of a crash in a new part of Yandex written in Django. Writing to a magic session variable caused an unexpected write into an InnoDB database on every request. Writes took 6-7 seconds because of index rebuilding. Lots of useful details on the sizing of their system, what went wrong, and how they fixed it. Yandex is a Russian search engine with 3.5 billion pages in their search index. We only know a few fun facts about how they do things, nothing at a detailed architecture level. Hopefully we'll learn more later, but I thought it would still be interesting. From Allen Stern's interview with Yandex's CTO Ilya Segalovich, we learn:

  • 3.5 billion pages in the search index.
  • Over several thousand servers.
  • 35 million searches a day.
  • Several data centers around Russia.
  • Two-layer architecture.
  • The database is split in pieces and when a search is requested, it pulls the bits from the different database servers and brings it together for the user.
  • Languages used: c++, perl, some java.
  • FreeBSD is used as their server OS.
  • $72 million in revenue in 2006.

    Click to read more ...