High Scalability -

Entries in postgresql (11)

Wednesday

Jan222020

Follower Clusters – 3 Major Use Cases for Syncing SQL & NoSQL Deployments

Wednesday, January 22, 2020 at 9:38AM

Follower clusters are a ScaleGrid feature that allows you to keep two independent database systems (of the same type) in sync. Unlike cloning or replication, this allows you to maintain an active, point-in-time copy of your production data. This extra cluster, known as a follower cluster, can be leveraged for multiple use cases, including for analyzing, optimizing and testing your application performance for MongoDB, MySQL and PostgreSQL. In this blog post, we will cover the top three scenarios to leverage follower clusters for your application.

How Do Follower Clusters Differ From Replication?

Unlike a static clone, this data imports on a set schedule so your follower cluster is always in sync with your production cluster. Here are a few critical ways in which it differs from replication:

Click to read more ...

Kristi Anderson |

Post a Comment |

Permalink |

Print Article

Email Article

tagged

Analytics Cluster,

Cloning,

Database Development,

mysql,

postgresql in

Appliction monitoring,

DevOps,

MySQL,

admin,

database replication,

database scalability,

sql,

testing

Wednesday

Jan082020

PostgreSQL Connection Pooling: Part 2 – PgBouncer

Wednesday, January 8, 2020 at 9:24AM

When it comes to connection pooling in the PostgreSQL world, PgBouncer is probably the most popular option. It’s a very simple utility that does exactly one thing – it sits between the database and the clients and speaks the PostgreSQL protocol, emulating a PostgreSQL server. A client connects to PgBouncer with the exact same syntax it would use when connecting directly to PostgreSQL – PgBouncer is essentially invisible.

Click to read more ...

Kristi Anderson |

1 Comment |

Permalink |

tagged

devops,

sql in

DevOps,

cache,

database scalability,

server,

sql,

tutorial

Friday

Oct182019

PostgreSQL Connection Pooling: Part 1 – Pros & Cons

Friday, October 18, 2019 at 8:51AM

A long time ago, in a galaxy far far away, ‘threads’ were a programming novelty rarely used and seldom trusted. In that environment, the first PostgreSQL developers decided forking a process for each connection to the database is the safest choice. It would be a shame if your database crashed, after all.

Since then, a lot of water has flown under that bridge, but the PostgreSQL community has stuck by their original decision. It is difficult to fault their argument – as it’s absolutely true that:

Click to read more ...

Kristi Anderson |

Post a Comment |

Permalink |

tagged

sql in

DevOps,

Linux,

database performance online website,

sql

Monday

Sep162019

Managing High Availability in PostgreSQL – Part III: Patroni

Monday, September 16, 2019 at 12:31PM

In our previous blog posts, we discussed the capabilities and functioning of PostgreSQL Automatic Failover (PAF) by Cluster Labs and Replication Manager (repmgr) by 2ndQuadrant. In the final post of this series, we will review the last solution, Patroni by Zalando, and compare all three at the end so you can determine which high availability framework is best for your PostgreSQL hosting deployment.

Patroni for PostgreSQL

Click to read more ...

Kristi Anderson |

Post a Comment |

Permalink |

Print Article

Email Article

tagged

DBA,

Patroni,

PostgreSQL Automatic Failover,

python,

sql in

Appliction monitoring,

Clustering,

DevOps,

Failure Analysis,

Geo-distributed Clusters,

Python,

cloud,

database replication,

deployment,

enterprise architecture,

sql,

tutorial

Thursday

Jun272019

2019 Open Source Database Report: Top Databases, Public Cloud vs. On-Premise, Polyglot Persistence

Thursday, June 27, 2019 at 9:33AM

2019 Open Source Database Report: Top Databases, Public Cloud vs. On-Premise, Polyglot Persistence

Ready to transition from a commercial database to open source, and want to know which databases are most popular in 2019? Wondering whether an on-premise vs. public cloud vs. hybrid cloud infrastructure is best for your database strategy? Or, considering adding a new database to your application and want to see which combinations are most popular? We found all the answers you need at the Percona Live event last month, and broke down the insights into the following free trends reports:

Click to read more ...

Kristi Anderson |

1 Comment |

Permalink |

Print Article

Email Article

tagged

Azure,

Db2,

Google Cloud Platform,

aws,

mysql,

oracle,

postgresql,

redis in

AWS,

ClickHouse,

Database,

DevOps,

HBase,

Memcached,

MongoDB,

MySQL,

Oracle,

Postgres,

RDBMS,

Redis,

cloud,

data,

db2,

google,

hybrid,

nosql,

sql,

sql server 2008

Wednesday

Apr032019

2019 PostgreSQL Trends Report: Private vs. Public Cloud, Migrations, Database Combinations & Top Reasons Used

Wednesday, April 3, 2019 at 9:05AM

PostgreSQL is an open source object-relational database system that has soared in popularity over the past 30 years from its active, loyal, and growing community. For the 2nd year in a row, PostgreSQL has kept the title of #1 fastest growing database in the world according to the DBMS of the Year report by the experts at DB-Engines. So what makes PostgreSQL so special, and how is it being used today? We found the answers at the Postgres Conference in March where we surveyed PostgreSQL users, contributors, and SQL and NoSQL database administrators alike. In this free PostgreSQL Trends Report, we break down PostgreSQL hosting use across public cloud vs. private cloud vs. hybrid cloud, most popular cloud providers, migration trends, database combinations with Postgres, and why PostgreSQL is preferred over popular RDBMS alternatives.

Private Cloud vs. Public Cloud vs. Hybrid Cloud

Click to read more ...

Kristi Anderson |

Post a Comment |

Permalink |

tagged

More concurrency: Improved locking in PostgreSQL

Tuesday, October 13, 2015 at 10:40AM

If you want to build a large scale website, scaling out the webserver is not enough. It is also necessary to cleverly manage the database side. a key to high scalability is locking.

In PostgreSQL we got a couple of new cool features to reduce locking and to speed up things due to improved concurrency.

General recommendations: Before attacking locking, however, it makes sense to check what is really going on on your PostgreSQL database server. To do so I recommend to take a look at pg_stat_statements and to carefully track down bottlenecks. Here is how it works:

Click to read more ...

Hans-Jürgen Schönig |

5 Comments |

Permalink |

Print Article

Email Article

tagged

postgresql sql "for update" postgres "for share" mvcc concurrency in

postgresql

Thursday

Mar042010

How MySpace Tested Their Live Site with 1 Million Concurrent Users

Thursday, March 4, 2010 at 7:50AM

This is a guest post by Dan Bartow, VP of SOASTA, talking about how they pelted MySpace with 1 million concurrent users using 800 EC2 instances. I thought this was an interesting story because: that's a lot of users, it takes big cajones to test your live site like that, and not everything worked out quite as expected. I'd like to thank Dan for taking the time to write and share this article.

In December of 2009 MySpace launched a new wave of streaming music video offerings in New Zealand, building on the previous success of MySpace music. These new features included the ability to watch music videos, search for artist’s videos, create lists of favorites, and more. The anticipated load increase from a feature like this on a popular site like MySpace is huge, and they wanted to test these features before making them live.

If you manage the infrastructure that sits behind a high traffic application you don’t want any surprises. You want to understand your breaking points, define your capacity thresholds, and know how to react when those thresholds are exceeded. Testing the production infrastructure with actual anticipated load levels is the only way to understand how things will behave when peak traffic arrives.

For MySpace, the goal was to test an additional 1 million concurrent users on their live site stressing the new video features. The key word here is ‘concurrent’. Not over the course of an hour or day… 1 million users concurrently active on the site. It should be noted that 1 million virtual users are only a portion of what MySpace typically has on the site during its peaks. They wanted to supplement the live traffic with test traffic to get an idea of the overall performance impact of the new launch on the entire infrastructure. This requires a massive amount of load generation capability, which is where cloud computing comes into play. To do this testing, MySpace worked with SOASTA to use the cloud as a load generation platform.

Here are the details of the load that was generated during testing.

Click to read more ...

HighScalability Team |

19 Comments |

Permalink |

testing

Thursday

May282009

Scaling PostgreSQL using CUDA

Thursday, May 28, 2009 at 5:04AM

Combining GPU power with PostgreSQL PostgreSQL is one of the world's leading Open Source databases and it provides enormous flexibility as well as extensibility. One of the key features of PostgreSQL is that users can define their own procedures and functions in basically any known programming language. With the means of functions it is possible to write basically any server side codes easily. Now, all this extensibility is basically not new. What does it all have to do with scaling and then? Well, imagine a world where the data in your database and enormous computing power are tightly integrated. Imagine a world where data inside your database has direct access to hundreds of FPUs. Welcome to the world of CUDA, NVIDIA's way of making the power of graphics cards available to normal, high-performance applications. When it comes to complex computations databases might very well turn out to be a bottleneck. Depending on your application it might easily happen that adding more CPU power does not improve the overall performance of your system – the reason for that is simply that bringing data from your database to those units which actually do the computations is ways too slow (maybe because of remote calls and so on). Especially when data is flowing over a network, copying a lot of data might be limited by network latency or simply bandwidth. What if this bottleneck could be avoided? CUDA is C / C++ Basically a CUDA program is simple a C program with some small extensions. The CUDA subsystem transforms your CUDA program to normal C code which can then be compiled and linked nicely with existing code. This also means that CUDA code can basically be used to work inside a PostgreSQL stored procedure easily. The advantages of this mechanism are obvious: GPUs can do matrix and FPU related operations hundreds of times faster than any CPU the GPU is used inside the database and thus no data has to be transported over slow lines basically any NVIDIA graphics card can be used you get enormous computing power for virtually zero cost you can even build functional indexes on top of CUDA stored procedures not so many boxes are needed because one box is ways faster How to make it work? How to make this all work now? The goal for this simplistic example is to generate a set of random number on the CPU, copy it to the GPU and make the code callable from PostgreSQL. Here is the function to generate random numbers and to copy them to the GPU: /* implement random generator and copy to CUDA */ nn_precision* generate_random_numbers(int number_of_values) { nn_precision *cuda_float_p; /* allocate host memory and CUDA memory */ nn_precision *host_p = (nn_precision *)pg_palloc(sizeof(nn_precision) * number_of_values); CUDATOOLS_SAFE_CALL( cudaMalloc( (void**) &cuda_float_p, sizeof(nn_precision) * number_of_values)); /* create random numbers */ for (int i = 0; i < number_of_values; i++) { host_p[i] = (nn_precision) drand48(); } /* copy data to CUDA and return pointer to CUDA structure */ CUDATOOLS_SAFE_CALL( cudaMemcpy(cuda_float_p, host_p, sizeof(nn_precision) * number_of_values, cudaMemcpyHostToDevice) ); return cuda_float_p; } Now we can go and call this function from a PostgreSQL stored procedure: /* import postgres internal stuff */ #include "postgres.h" #include "fmgr.h" #include "funcapi.h" #include "utils/memutils.h" #include "utils/elog.h" #include "cuda_tools.h" PG_MODULE_MAGIC; /* prototypes to silence compiler */ extern Datum test_random(PG_FUNCTION_ARGS); /* define function to allocate N random values (0 - 1.0) and put it into the CUDA device */ PG_FUNCTION_INFO_V1(test_random); Datum test_random(PG_FUNCTION_ARGS) { int number = PG_GETARG_INT32(0); nn_precision *p = generate_random_numbers(number); cuda_free_array(p); PG_RETURN_VOID(); } This code then now be nicely compiled just like any other PostgreSQL C extension. The test random function can be called just like this: SELECT test_random(1000); Of course this is a just brief introduction to see how things can practically be done. A more realistic application will need more thinking and can be integrated into the database even more closely. More information: Professional CUDA programming Professional PostgreSQL services The official PostgreSQL Website The official CUDA site

Click to read more ...

Todd Hoff |

6 Comments |

Permalink |

Print Article

Email Article

C++,

Postgres,

cuda,

gpu,

matrix,

nvcc,

postgresql,

sql

Monday

Jun092008

FaceStat's Rousing Tale of Scaling Woe and Wisdom Won

Monday, June 9, 2008 at 12:58AM

Lukas Biewald shares a fascinating slam by slam recount of how his FaceStat (upload your picture and be judged by the masses) site was battered by a link on Yahoo's main page that caused an almost instantaneous 650,000 page view jump on their site. Yahoo spends considerable effort making sure its own properties can handle the truly massive flow from the main page. Turning the Great Eye of the Internet towards an unsuspecting newborn site must be quite the diaper ready experience. Theo Schlossnagle eerily prophesized about such events in The Implications of Punctuated Scalabilium for Website Architecture: massive, unexpected and sudden traffic spikes will become more common as a fickle internet seeks ever for new entertainments (my summary). Exactly FaceStat's situation. This is also one of our first exposures to an application written on Merb, a popular Ruby on Rails competitor. For those who think Ruby is the problem, their architecture now serves 100 times the original load. How did our fine FaceStat fellowship fair against Yahoo’s onslaught? Not a lot of details of FaceStat’s architecture are available, so it’s not that kind of post. What interested me is that it’s a timely example of Theo’s traffic spike phenomena and I was also taken with how well the team handled the challenge. Few could do better so quickly. In fact, let’s apply Theo’s rubric for how to handle these situations to FaceStat:

Be Alert: build automated systems to detect and pinpoint the cause of these issues quickly (in less than 60 seconds). None initially, but they are building in more monitoring to better handle future situations. Better monitoring would have alerted them to the problems long before they actually were alerted. Perhaps many more potential customers might have been converted to actual customers. You can never have enough monitoring!

Be Prepared: understand the bottlenecks of your service systemically. As the system was relatively simple, new, and quickly changed, my assumption is they were fully aware of their system’s shortcomings, they were just busy with adding features rather than worrying about performance and scalability.

Perform Triage: understand the importance of the various services that make up your site. Definitely. They “started ripping out every database intensive feature” in response to the load.

Be Calm: any action that is not analytically driven is a waste of time and energy. They stayed amazingly calm as can be seen from the following quote: “It’s one thing to code scalably and grow slowly under increasing load, but it’s been a blast to crazily rearchitect a live site like FaceStat in a day or two.” I’m not sure how analytically driven they were however  All-in-all an impressive response to the Great Eye’s undivided attention. But not everyone was impressed as I. A commenter named Bernard said: Sorry, but this is a really dumb story. Given how dirt cheap things like slicehost and linode are, it is crazy that you launched a web app and had not already prepared a redundant, highly-scalable architecture… I’d say you were damn lucky that the disappointed users came back at all. Commenter Will thought it was a “Nice problem to be having!” Which it is, of course, being noticed is better than being ignored. But Lukas was spot on when he lamented about being noticed too soon has a downside: After working so hard to get users to come to your site, it’s amazingly frustrating to see hundreds of thousands of people suddenly locked out. Clearly we still don’t have the ability for developers to create scalable systems as simply as they create exploratory systems. Ed from Rackspace posted that they could help with their Auto Scale of Arrays feature. And Rackspace would be an excellent solution, but the cost would be $500/month and a $2500 setup fee. No “let’s put on a show” startup can afford those costs. The mode FaceStat was in is typical: We find that a Rails-like platform is invaluable for rapidly prototyping a new site, especially since we started FaceStat as a pure experiment with no idea whether people would like it or not, and with a very different feature set in mind compared to what it later became. A pay as you grow model is essential for scalability because that’s the only way you can bake scalability in from the start. And even with all the impressive advances in the industry we still don’t have the software infrastructure to make scaling second nature.

Information Sources

Scaling Fast by Lukas Biewald

FaceStat scales! on Dlores BLog

Platform

Merb. Ruby based MVC framework that is ORM-agnostic.

Thin. A fast and very simple Ruby web server.

Slicehost. Hosting service. Able to quickly provision servers as needed.

Amazon’s S3. Image server. Latency is high but it handles the load.

Capistrano. Automated deployment.

Git with github. Source code control system. Supports efficient simultaneous development, quick merging and deployment.

God. Server monitoring and management.

Memcached. Application caching layer.

PostgreSQL

The Stats

Six app servers.

One big database machine.

The Architecture

FaceStat is a write heavy application and performs involved calculations on data.

S3 is used to offload the responsibility for storing images. This freed them from the massive bandwidth requirements and complexity of managing their own images.

Memcached offloads reads from the database to allow the database to have more time for writes.

Lessons Learned

Monitor the site. The sooner you know about a problem the faster it can be fixed. Don't rely on user email or email from exception handlers or you'll never get ahead of problems.

Communicate with your users with an error page. A meaningful error pages shows you care and that you are working on the problem. That's enough for a second chance with most people.

Use a cached statically generated homepage. Hard to beat that for performance.

Big sites might want to give a heads up when they mention smaller sites. Just a short polite email saying how your world will soon turn upside down would do.

High-level platform really doesn’t matter compared to overall architecture. How you handle writes, reads, caching, deployment, monitoring, etc are relatively framework independent and it's how you solve those problems that matter.

Ruby and Merb supported rapid prototyping to experiment and create a radically different system form the one they intended.

Click to read more ...

Todd Hoff |

2 Comments |

Permalink |

Git,

God,

Memcached,