High Scalability -

Entries in Postgres (14)

Wednesday

Jan222020

Follower Clusters – 3 Major Use Cases for Syncing SQL & NoSQL Deployments

Wednesday, January 22, 2020 at 9:38AM

Follower clusters are a ScaleGrid feature that allows you to keep two independent database systems (of the same type) in sync. Unlike cloning or replication, this allows you to maintain an active, point-in-time copy of your production data. This extra cluster, known as a follower cluster, can be leveraged for multiple use cases, including for analyzing, optimizing and testing your application performance for MongoDB, MySQL and PostgreSQL. In this blog post, we will cover the top three scenarios to leverage follower clusters for your application.

How Do Follower Clusters Differ From Replication?

Unlike a static clone, this data imports on a set schedule so your follower cluster is always in sync with your production cluster. Here are a few critical ways in which it differs from replication:

Click to read more ...

Kristi Anderson |

Post a Comment |

Permalink |

Print Article

Email Article

tagged

Analytics Cluster,

Cloning,

Database Development,

mysql,

postgresql in

Appliction monitoring,

DevOps,

MySQL,

admin,

database replication,

database scalability,

sql,

testing

Wednesday

Jan082020

PostgreSQL Connection Pooling: Part 2 – PgBouncer

Wednesday, January 8, 2020 at 9:24AM

When it comes to connection pooling in the PostgreSQL world, PgBouncer is probably the most popular option. It’s a very simple utility that does exactly one thing – it sits between the database and the clients and speaks the PostgreSQL protocol, emulating a PostgreSQL server. A client connects to PgBouncer with the exact same syntax it would use when connecting directly to PostgreSQL – PgBouncer is essentially invisible.

Click to read more ...

Kristi Anderson |

1 Comment |

Permalink |

tagged

devops,

sql in

DevOps,

cache,

database scalability,

server,

sql,

tutorial

Friday

Oct182019

PostgreSQL Connection Pooling: Part 1 – Pros & Cons

Friday, October 18, 2019 at 8:51AM

A long time ago, in a galaxy far far away, ‘threads’ were a programming novelty rarely used and seldom trusted. In that environment, the first PostgreSQL developers decided forking a process for each connection to the database is the safest choice. It would be a shame if your database crashed, after all.

Since then, a lot of water has flown under that bridge, but the PostgreSQL community has stuck by their original decision. It is difficult to fault their argument – as it’s absolutely true that:

Click to read more ...

Kristi Anderson |

Post a Comment |

Permalink |

tagged

sql in

DevOps,

Linux,

database performance online website,

sql

Monday

Sep162019

Managing High Availability in PostgreSQL – Part III: Patroni

Monday, September 16, 2019 at 12:31PM

In our previous blog posts, we discussed the capabilities and functioning of PostgreSQL Automatic Failover (PAF) by Cluster Labs and Replication Manager (repmgr) by 2ndQuadrant. In the final post of this series, we will review the last solution, Patroni by Zalando, and compare all three at the end so you can determine which high availability framework is best for your PostgreSQL hosting deployment.

Patroni for PostgreSQL

Click to read more ...

Kristi Anderson |

Post a Comment |

Permalink |

Print Article

Email Article

tagged

DBA,

Patroni,

PostgreSQL Automatic Failover,

python,

sql in

Appliction monitoring,

Clustering,

DevOps,

Failure Analysis,

Geo-distributed Clusters,

Python,

cloud,

database replication,

deployment,

enterprise architecture,

sql,

tutorial

Thursday

Jun272019

2019 Open Source Database Report: Top Databases, Public Cloud vs. On-Premise, Polyglot Persistence

Thursday, June 27, 2019 at 9:33AM

2019 Open Source Database Report: Top Databases, Public Cloud vs. On-Premise, Polyglot Persistence

Ready to transition from a commercial database to open source, and want to know which databases are most popular in 2019? Wondering whether an on-premise vs. public cloud vs. hybrid cloud infrastructure is best for your database strategy? Or, considering adding a new database to your application and want to see which combinations are most popular? We found all the answers you need at the Percona Live event last month, and broke down the insights into the following free trends reports:

Click to read more ...

Kristi Anderson |

1 Comment |

Permalink |

Print Article

Email Article

tagged

Azure,

Db2,

Google Cloud Platform,

aws,

mysql,

oracle,

postgresql,

redis in

AWS,

ClickHouse,

Database,

DevOps,

HBase,

Memcached,

MongoDB,

MySQL,

Oracle,

Postgres,

RDBMS,

Redis,

cloud,

data,

db2,

google,

hybrid,

nosql,

sql,

sql server 2008

Wednesday

Apr032019

2019 PostgreSQL Trends Report: Private vs. Public Cloud, Migrations, Database Combinations & Top Reasons Used

Wednesday, April 3, 2019 at 9:05AM

PostgreSQL is an open source object-relational database system that has soared in popularity over the past 30 years from its active, loyal, and growing community. For the 2nd year in a row, PostgreSQL has kept the title of #1 fastest growing database in the world according to the DBMS of the Year report by the experts at DB-Engines. So what makes PostgreSQL so special, and how is it being used today? We found the answers at the Postgres Conference in March where we surveyed PostgreSQL users, contributors, and SQL and NoSQL database administrators alike. In this free PostgreSQL Trends Report, we break down PostgreSQL hosting use across public cloud vs. private cloud vs. hybrid cloud, most popular cloud providers, migration trends, database combinations with Postgres, and why PostgreSQL is preferred over popular RDBMS alternatives.

Private Cloud vs. Public Cloud vs. Hybrid Cloud

Click to read more ...

Kristi Anderson |

Post a Comment |

Permalink |

tagged

MocoSpace Architecture - 3 Billion Mobile Page Views a Month

Monday, May 3, 2010 at 7:23AM

This is a guest post by Jamie Hall, Co-founder & CTO of MocoSpace, describing the architecture for their mobile social network. This is a timely architecture to learn from as it combines several hot trends: it is very large, mobile, and social. What they think is especially cool about their system is: how it optimizes for device/browser fragmentation on the mobile Web; their multi-tiered, read/write, local/distributed caching system; selecting PostgreSQL over MySQL as a relational DB that can scale.

MocoSpace is a mobile social network, with 12 million members and 3 billion page views a month, which makes it one of the most highly trafficked mobile Websites in the US. Members access the site mainly from their mobile phone Web browser, ranging from high end smartphones to lower end devices, as well as the Web. Activities on the site include customizing profiles, chat, instant messaging, music, sharing photos & videos, games, eCards and blogs. The monetization strategy is focused on advertising, on both the mobile and Websites, as well as a virtual currency system and a handful of premium feature upgrades.

Stats

Click to read more ...

HighScalability Team |

7 Comments |

Permalink |

Print Article

Email Article

Example,

Postgres

Thursday

May282009

Scaling PostgreSQL using CUDA

Thursday, May 28, 2009 at 5:04AM

Combining GPU power with PostgreSQL PostgreSQL is one of the world's leading Open Source databases and it provides enormous flexibility as well as extensibility. One of the key features of PostgreSQL is that users can define their own procedures and functions in basically any known programming language. With the means of functions it is possible to write basically any server side codes easily. Now, all this extensibility is basically not new. What does it all have to do with scaling and then? Well, imagine a world where the data in your database and enormous computing power are tightly integrated. Imagine a world where data inside your database has direct access to hundreds of FPUs. Welcome to the world of CUDA, NVIDIA's way of making the power of graphics cards available to normal, high-performance applications. When it comes to complex computations databases might very well turn out to be a bottleneck. Depending on your application it might easily happen that adding more CPU power does not improve the overall performance of your system – the reason for that is simply that bringing data from your database to those units which actually do the computations is ways too slow (maybe because of remote calls and so on). Especially when data is flowing over a network, copying a lot of data might be limited by network latency or simply bandwidth. What if this bottleneck could be avoided? CUDA is C / C++ Basically a CUDA program is simple a C program with some small extensions. The CUDA subsystem transforms your CUDA program to normal C code which can then be compiled and linked nicely with existing code. This also means that CUDA code can basically be used to work inside a PostgreSQL stored procedure easily. The advantages of this mechanism are obvious: GPUs can do matrix and FPU related operations hundreds of times faster than any CPU the GPU is used inside the database and thus no data has to be transported over slow lines basically any NVIDIA graphics card can be used you get enormous computing power for virtually zero cost you can even build functional indexes on top of CUDA stored procedures not so many boxes are needed because one box is ways faster How to make it work? How to make this all work now? The goal for this simplistic example is to generate a set of random number on the CPU, copy it to the GPU and make the code callable from PostgreSQL. Here is the function to generate random numbers and to copy them to the GPU: /* implement random generator and copy to CUDA */ nn_precision* generate_random_numbers(int number_of_values) { nn_precision *cuda_float_p; /* allocate host memory and CUDA memory */ nn_precision *host_p = (nn_precision *)pg_palloc(sizeof(nn_precision) * number_of_values); CUDATOOLS_SAFE_CALL( cudaMalloc( (void**) &cuda_float_p, sizeof(nn_precision) * number_of_values)); /* create random numbers */ for (int i = 0; i < number_of_values; i++) { host_p[i] = (nn_precision) drand48(); } /* copy data to CUDA and return pointer to CUDA structure */ CUDATOOLS_SAFE_CALL( cudaMemcpy(cuda_float_p, host_p, sizeof(nn_precision) * number_of_values, cudaMemcpyHostToDevice) ); return cuda_float_p; } Now we can go and call this function from a PostgreSQL stored procedure: /* import postgres internal stuff */ #include "postgres.h" #include "fmgr.h" #include "funcapi.h" #include "utils/memutils.h" #include "utils/elog.h" #include "cuda_tools.h" PG_MODULE_MAGIC; /* prototypes to silence compiler */ extern Datum test_random(PG_FUNCTION_ARGS); /* define function to allocate N random values (0 - 1.0) and put it into the CUDA device */ PG_FUNCTION_INFO_V1(test_random); Datum test_random(PG_FUNCTION_ARGS) { int number = PG_GETARG_INT32(0); nn_precision *p = generate_random_numbers(number); cuda_free_array(p); PG_RETURN_VOID(); } This code then now be nicely compiled just like any other PostgreSQL C extension. The test random function can be called just like this: SELECT test_random(1000); Of course this is a just brief introduction to see how things can practically be done. A more realistic application will need more thinking and can be integrated into the database even more closely. More information: Professional CUDA programming Professional PostgreSQL services The official PostgreSQL Website The official CUDA site

Click to read more ...

Todd Hoff |

6 Comments |

Permalink |

Print Article

Email Article

C++,

Postgres,

cuda,

gpu,

matrix,

nvcc,

postgresql,

sql

Friday

Apr242009

Heroku - Simultaneously Develop and Deploy Automatically Scalable Rails Applications in the Cloud

Friday, April 24, 2009 at 2:09AM

Update 4: Heroku versus GAE & GAE/J

Update 3: Heroku has gone live!. Congratulations to the team. It's difficult right now to get a feeling for the relative cost and reliability of Heroku, but it's an impressive accomplishment and a viable option for people looking for a delivery platform.

Update 2: Heroku Architecture. A great interactive presentation of the Heroku stack. Requests flow into Nginx used as a HTTP Reverse Proxy. Nginx routes requests into a Varnish based HTTP cache. Then requests are injected into an Erlang based routing mesh that balances requests across a grid of dynos. Dynos are your application "VMs" that implement application specific behaviors. Dynos themselves are a stack of: POSIX, Ruby VM, App Server, Rack, Middleware, Framework, Your App. Applications can access PostgreSQL. Memcached is used as an application caching layer.

Update: Aaron Worsham Interview with James Lindenbaum, CEO of Heroku. Aaron nicely sums up their goal: Heroku is looking to eliminate all the reasons companies have for not doing software projects.

Adam Wiggins of Heroku presented at the lollapalooza that was the Cloud Computing Demo Night. The idea behind Heroku is that you upload a Rails application into Heroku and it automatically deploys into EC2 and it automatically scales using behind the scenes magic. They call this "liquid scaling." You just dump your code and go. You don't have to think about SVN, databases, mongrels, load balancing, or hosting. You just concentrate on building your application. Heroku's unique feature is their web based development environment that lets you develop applications completely from their control panel. Or you can stick with your own development environment and use their API and Git to move code in and out of their system.

For website developers this is as high up the stack as it gets. With Heroku we lose that "build your first lightsaber" moment marking the transition out of apprenticeship and into mastery. Upload your code and go isn't exactly a heroes journey, but it is damn effective...

I must confess to having an inherent love of Heroku's idea because I had a similar notion many moons ago, but the trendy language of the time was Perl instead of Rails. At the time though it just didn't make sense. The economics of creating your own "cloud" for such a different model wasn't there. It's amazing the niches utility computing will seed, fertilize, and help grow. Even today when using Eclipse I really wish it was hosted in the cloud and I didn't have to deal with all its deployment headaches. Firefox based interfaces are pretty impressive these days. Why not?

Adam views their stack as:
1. Developer Tools
2. Application Management
3. Cluster Management
4. Elastic Compute Cloud

At the top level developers see a control panel that lets them edit code, deploy code, interact with the database, see logs, and so on. Your website is live from the first moment you start writing code. It's a powerful feeling to write normal code, see it run immediately, and know it will scale without further effort on your part. Now, will you be able toss your Facebook app into the Heroku engine and immediately handle a deluge of 500 million hits a month? It will be interesting to see how far a generic scaling model can go without special tweaking by a certified scaling professional. Elastra has the same sort of issue.

Underneath Heroku makes sure all the software components work together in Lennon-McCartney style harmony. They take care (or will take care of) starting and stopping VMs, deploying to those VMs, billing, load balancing, scaling, storage, upgrades, failover, etc. The dynamic nature of Ruby and the development and deployment infrastructure of Rails is what makes this type of hosting possible. You don't have to worry about builds. There's a great infrastructure for installing packages and plugins. And the big hard one of database upgrades is tackled with the new migrations feature.

A major issue in the Rails world is versioning. Given the precambrian explosion of Rails tools, how does Heroku make sure all the various versions of everything work together? Heroku sees this as their big value add. They are in charge of making sure everything works together. We see a lot companies on the web taking on the role of curator ([1], [2], [3]). A curator is a guardian or an overseer. Of curators Steve Rubel says: They acquire pieces that fit within the tone, direction and - above all - the purpose of the institution. They travel the corners of the world looking for "finds." Then, once located, clean them up and make sure they are presentable and offer the patron a high quality experience. That's the role Heroku will play for their deployable Rails environment.

With great automated power comes great restrictions. And great opportunity. Curating has a cost for developers: flexibility. The database they support is Postgres. Out of luck if you wan't MySQL. Want a different Ruby version or Rails version? Not if they don't support it. Want memcache? You just can't add it yourself. One forum poster wanted, for example, to use the command line version of ImageMagick but was told it wasn't installed and use RMagick instead. Not the end of the world. And this sort of curating has to be done to keep a happy and healthy environment running, but it is something to be aware of.

The upside of curation is stuff will work. And we all know how hard it can be to get stuff to work. When I see an EC2 AMI that already has most of what I need my heart goes pitter patter over the headaches I'll save because someone already did the heavy curation for me. A lot of the value in services like rPath offers, for example, is in curation. rPath helps you build images that work, that can be deployed automatically, and can be easily upgraded. It can take a big load off your shoulders.

There's a lot of competition for Heroku. Mosso has a hosting system that can do much of what Heroku wants to do. It can automatically scale up at the webserver, data, and storage tiers. It supports a variery of frameworks, including Rails. And Mosso also says all you have to do is load and go.

3Tera is another competitor. As one user said: It lets you visually (through a web ui) create "applications" based on "appliances". There is a standard portfolio of prebuilt applications (SugarCRM, etc.) and templates for LAMP, etc. So, we build our application by taking a firewall appliance, a CentOS appliance, a gateway, a MySql appliance, glue them together, customize them, and then create our own template. You can specify down to the appliance level, the amount of cpu, memory, disk, and bandwidth each are assigned which let's you scale up your capacity simply by tweaking values through the UI. We can now deploy our Rails/Java hosted offering for new customers in about 20 minutes on our grid. AppLogic has automatic failover so that if anything goes wrong, it reploys your application to a new node in your grid and restarts it. It's not as cheap as EC2, but much more powerful. True, 3Tera won't help with your application directly, but most of the hard bits are handled.

RightScale is another company that combines curation along with load balancing, scaling, failover, and system management.

What differentiates Heroku is their web based IDE that allows you to focus solely on the application and ignore the details. Though now that they have a command line based interface as well, it's not as clear how they will differentiate themselves from other offerings.

The hosting model has a possible downside if you want to do something other than straight web hosting. Let's say you want your system to insert commercials into podcasts. That sort of large scale batch logic doesn't cleanly fit into the hosting model. A separate service accessed via something like a REST interface needs to be created. Possibly double the work. Mosso suffers from this same concern. But maybe leaving the web front end to Heroku is exactly what you want to do. That would leave you to concentrate on the back end service without worrying about the web tier. That's a good approach too.

Heroku is just getting started so everything isn't in place yet. They've been working on how to scale their own infrastructure. Next is working on scaling user applications beyond starting and stopping mongrels based on load. They aren't doing any vertical scaling of the database yet. They plan on memcaching reads, implementing read-only slaves via Slony, and using the automatic partitioning features built into Postgres 8.3. The idea is to start a little smaller with them now and grow as they grow. By the time you need to scale bigger they should have the infrastructure in place.

One concern is that pricing isn't nailed down yet, but my gut says it will be fair. It's not clear how you will transfer an existing database over, especially from a non-Postgres database. And if you use the web IDE I wonder how you will normal project stuff like continuous integration, upgrades, branching, release tracking, and bug tracking? Certainly a lot of work to do and a lot of details to work out, but I am sure it's nothing they can't handle.

Heroku Rails Podcast

Heroku Open Source Plugins etc

Todd Hoff |

8 Comments |

Permalink |

AWS,

RoR,

cloud

Saturday

Apr052008

Skype Plans for PostgreSQL to Scale to 1 Billion Users

Saturday, April 5, 2008 at 4:13PM

Skype uses PostgreSQL as their backend database. PostgreSQL doesn't get enough run in the database world so I was excited to see how PostgreSQL is used "as the main DB for most of [Skype's] business needs." Their approach is to use a traditional stored procedure interface for accessing data and on top of that layer proxy servers which hash SQL requests to a set of database servers that actually carry out queries. The result is a horizontally partitioned system that they think will scale to handle 1 billion users.

Skype's goal is an architecture that can handle 1 billion plus users. This level of scale isn't practically solvable with one really big computer, so our masked superhero horizontal scaling comes to the rescue.

Hardware is dual or quad Opterons with SCSI RAID.

Followed common database progression: Start with one DB. Add new databases partitioned by functionality. Replicate read-mostly data for better read access. Then horizontally partition data across multiple nodes..

In a first for this blog anyway, Skype uses a traditional database architecture where all database access is encapsulated in stored procedures. This allows them to make behind the scenes performance tweaks without impacting frontend servers. And it fits in cleanly with their partitioning strategy using PL/Proxy.

PL/Proxy is used to scale the OLTP portion of their system by creating a horizontally partitioned cluster: - Database queries are routed by a proxy across a set of database servers. The proxy creates partitions based on a field value, typically a primary key. - For example, you could partition users across a cluster by hashing based on user name. Each user is slotted into a shard based on the hash. - Remote database calls are executed using a new PostgreSQL database language called plproxy. An example from Kristo Kaiv's blog:

First, code to insert a user in a database:
CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS text AS $$
BEGIN
    PERFORM 1 FROM users WHERE username = i_username;
    IF NOT FOUND THEN
        INSERT INTO users (username) VALUES (i_username);
        RETURN 'user created';
    ELSE
        RETURN 'user already exists';
    END IF;
END;
$$ LANGUAGE plpgsql SECURITY DEFINER;

Heres the proxy code to distribute the user insert to the correct partition:
queries=#
CREATE OR REPLACE FUNCTION insert_user(i_username text) RETURNS TEXT AS $$
    CLUSTER 'queries'; RUN ON hashtext(i_username);
$$ LANGUAGE plproxy;

Your SQL query looks normal:
SELECT insert_user("username");

- The result of a query is exactly that same as if was executed on the remote database. - Currently they can route 1000-2000 requests/sec on Dual Opteron servers to a 16 parition cluster.

They like PL/Proxy approach for OLTP because: - PL/Proxy servers form a scalable and uniform "DB-bus." Proxies are robust because in a redundant configuration if one fails you can just connect to another. And if the proxy tier becomes slow you can add more proxies and load balance between them. - More partitions can be added to improve performance. - Only data on a failed partition is unavailable during a failover. All other partitions operate normally.

PgBouncer is used as a connection pooler for PostgreSQL. PL/Proxy "somewhat wastes connections as it opens connection to each partition from each backend process" so the pooler helps reduce the number of connections.

Hot-standby servers are created using WAL (Write Ahead Log) shipping. It doesn't appear that these servers can be used for read-only operations.

More sophisticated organizations often uses an OLTP database system to handle high performance transaction needs and then create seperate systems for more non-transactional needs. For example, an OLAP (Online analytical processing) system is often used for handling complicated analysis and reporting problems. These differ in schema, indexing, etc from the OLTP system. Skype also uses seperate systems for the presentation layer of web applications, sending email, and prining invoices. This requires data be moved from the OLTP to the other systems. - Initially Slony1 was used to move data to the other systems, but "as the complexity and loads grew Slony1 started to cause us greater and greater pains." - To solve this problem Skype developed their on lighter weight queueing and replication toolkit called SkyTools. The proxy approach is interesting and is an architecture we haven't seen previously. Its power comes from the make another level of indirection school of problem solving, which has advantages:

Applications are independent of the structure of the database servers. That's encapsulated in the proxy servers.

Applications do not need to change in response to partition, mapping, or other changes.

Load balancing, failover, and read/write splitting are invisible to applications. The downsides are:

Reduced performance. Another hop is added and queries must be parsed to perform all the transparent magic.

Inability to perform joins and other database operations across partitions.

Added administration complexity of dealing with proxy configuration and HA for the proxy servers. It's easy to see how the advantages can outweigh the disadvantages. Without changing your application you can slip in a proxy layer and get a lot of very cool features for what seems like a low cost. If you are a MySQL user and this approach interests you then take a look at MySQL Proxy, which accomplishes something similar in a different sort of way.

An Unorthodox Approach to Database Design : The Coming of the Shard

PostgreSQProducts - Scaling infinitely with PL/Proxy

PL/Proxy

Heroku also uses PostgreSQL.

MySQL Proxy

PostgreSQL cluster: partitioning with plproxy (part I) by Kristo Kaiv'.

PostgreSQL cluster: partitioning with plproxy (part II) by Kristo Kaiv'.

PostgreSQL at Skype.

Skytools database scripting framework & PgQ by Kristo Kaiv'.

PostgreSQL High Availability.

Click to read more ...

Todd Hoff |