Entries in gpu (2)

Friday
Dec032010

GPU vs CPU Smackdown : The Rise of Throughput-Oriented Architectures

In some ways the original Amazon cloud, the one most of us still live in, was like that really cool house that when you stepped inside and saw the old green shag carpet in the living room, you knew the house hadn't been updated in a while. The network is a little slow, the processors are a bit dated, and virtualization made the house just feel smaller. It has been difficult to run high bandwidth or low latency workloads in the cloud. Bottlenecks everywhere. Not a big deal for most applications, but for many high performance applications (HPC) it was a killer.

In a typical house you might just do a remodel. Upgrade a few rooms. Swap out builder quality appliances with gleaming stainless steel monsters. But Amazon has a big lot, instead of remodeling they simply keep adding on entire new wings, kind of like the Winchester Mystery House of computing.

The first new wing added was a CPU based HPC system featuring blazingly fast Nehalem chips, virtualization replaced by a close to metal Hardware Virtual Machine (HVM) architecture, and the network is a monster 10 gigabits with the ability to specify placement groups to carve out a low-latency, high bandwidth cluster. Bottlenecks removed. Most people still probably don't even know this part of the house exists.

The newest addition is a beauty, it's a graphics processing unit (GPU) cluster as described by Werner Vogels in Expanding the Cloud - Adding the Incredible Power of the Amazon EC2 Cluster GPU Instances . It's completely modern and contemporary. The shag carpet is out. In are Nvidia M2050 GPU based clusters which make short work of applications in the sciences, finance, oil & gas, movie studios and graphics.

Click to read more ...

Thursday
May282009

Scaling PostgreSQL using CUDA

Combining GPU power with PostgreSQL PostgreSQL is one of the world's leading Open Source databases and it provides enormous flexibility as well as extensibility. One of the key features of PostgreSQL is that users can define their own procedures and functions in basically any known programming language. With the means of functions it is possible to write basically any server side codes easily. Now, all this extensibility is basically not new. What does it all have to do with scaling and then? Well, imagine a world where the data in your database and enormous computing power are tightly integrated. Imagine a world where data inside your database has direct access to hundreds of FPUs. Welcome to the world of CUDA, NVIDIA's way of making the power of graphics cards available to normal, high-performance applications. When it comes to complex computations databases might very well turn out to be a bottleneck. Depending on your application it might easily happen that adding more CPU power does not improve the overall performance of your system – the reason for that is simply that bringing data from your database to those units which actually do the computations is ways too slow (maybe because of remote calls and so on). Especially when data is flowing over a network, copying a lot of data might be limited by network latency or simply bandwidth. What if this bottleneck could be avoided? CUDA is C / C++ Basically a CUDA program is simple a C program with some small extensions. The CUDA subsystem transforms your CUDA program to normal C code which can then be compiled and linked nicely with existing code. This also means that CUDA code can basically be used to work inside a PostgreSQL stored procedure easily. The advantages of this mechanism are obvious: GPUs can do matrix and FPU related operations hundreds of times faster than any CPU the GPU is used inside the database and thus no data has to be transported over slow lines basically any NVIDIA graphics card can be used you get enormous computing power for virtually zero cost you can even build functional indexes on top of CUDA stored procedures not so many boxes are needed because one box is ways faster How to make it work? How to make this all work now? The goal for this simplistic example is to generate a set of random number on the CPU, copy it to the GPU and make the code callable from PostgreSQL. Here is the function to generate random numbers and to copy them to the GPU: /* implement random generator and copy to CUDA */ nn_precision* generate_random_numbers(int number_of_values) { nn_precision *cuda_float_p; /* allocate host memory and CUDA memory */ nn_precision *host_p = (nn_precision *)pg_palloc(sizeof(nn_precision) * number_of_values); CUDATOOLS_SAFE_CALL( cudaMalloc( (void**) &cuda_float_p, sizeof(nn_precision) * number_of_values)); /* create random numbers */ for (int i = 0; i < number_of_values; i++) { host_p[i] = (nn_precision) drand48(); } /* copy data to CUDA and return pointer to CUDA structure */ CUDATOOLS_SAFE_CALL( cudaMemcpy(cuda_float_p, host_p, sizeof(nn_precision) * number_of_values, cudaMemcpyHostToDevice) ); return cuda_float_p; } Now we can go and call this function from a PostgreSQL stored procedure: /* import postgres internal stuff */ #include "postgres.h" #include "fmgr.h" #include "funcapi.h" #include "utils/memutils.h" #include "utils/elog.h" #include "cuda_tools.h" PG_MODULE_MAGIC; /* prototypes to silence compiler */ extern Datum test_random(PG_FUNCTION_ARGS); /* define function to allocate N random values (0 - 1.0) and put it into the CUDA device */ PG_FUNCTION_INFO_V1(test_random); Datum test_random(PG_FUNCTION_ARGS) { int number = PG_GETARG_INT32(0); nn_precision *p = generate_random_numbers(number); cuda_free_array(p); PG_RETURN_VOID(); } This code then now be nicely compiled just like any other PostgreSQL C extension. The test random function can be called just like this: SELECT test_random(1000); Of course this is a just brief introduction to see how things can practically be done. A more realistic application will need more thinking and can be integrated into the database even more closely. More information: Professional CUDA programming Professional PostgreSQL services The official PostgreSQL Website The official CUDA site

Click to read more ...