More concurrency: Improved locking in PostgreSQL
If you want to build a large scale website, scaling out the webserver is not enough. It is also necessary to cleverly manage the database side. a key to high scalability is locking.
In PostgreSQL we got a couple of new cool features to reduce locking and to speed up things due to improved concurrency.
General recommendations: Before attacking locking, however, it makes sense to check what is really going on on your PostgreSQL database server. To do so I recommend to take a look at pg_stat_statements and to carefully track down bottlenecks. Here is how it works:
Improving SELECT FOR UPDATE
Imagine two people are trying to modify the same row at the same time. Each user will first SELECT the row to inspect its content and then start updating. The nasty thing is: Both users might see the old row and overwrite each other's changes. This is a classical race condition.
In real life this can have nasty consequences: Two people might book the same flight on an aircraft or people might draw more money from their accounts than they actually have in their account. This is far from desirable.
Let us return to the airline example and assume that somebody wants to book a seat on an airliner:
SELECT ...
FROM table
WHERE class = 'economy'
AND empty = true
LIMIT 1
FOR UPDATE
The trouble now is: If somebody else also tries to grab a seat, he will find the seat selected by the first person. But: This row is blocked. The SELECT FOR UPDATE of the second guy has to wait until the first guy has finished his transaction. Remember, the passanger might be happy with any seat on the airliner so there is no point waiting for a specific line.
PostgreSQL 9.5 comes to the rescue. There is a new way to obtain rows:
SELECT ...
FROM table
WHERE class = 'economy'
AND empty = true
LIMIT 1
FOR UPDATE SKIP LOCKED
The beauty here is that PostgreSQL will simply ignore locked rows and return one, which is not blocked by anybody. This makes sense because 100 users checking for a free seat concurrently will get 100 different rows. The consequence is that you are not stuck with 1 CPU but you can nicely scale out to all CPUs in the system. As conflicts cannot happen anymore, nobody has to wait on somebody else.
SELECT FOR SHARE
There is one more thing, which can make PostgreSQL provide you with more concurrency. Consider the following example:
SELECT *
FROM account AS a, currency AS c
WHERE a.currency = c.id
AND a.account = 4711
FOR UPDATE
In this case somebody wants to check his bank account. The main question now is: Which rows are locked? The answer is: The account AND the currency. Blocking an entire currency just because one person wants to draw money from the ATM is clearly not a good idea. Many people should be able to draw money at the same time. But: At this point PostgreSQL has idea which one of those two tables you want to update.
The solution is simple:
FOR UPDATE OF account FOR SHARE OF currency
By telling PostgreSQL what we are planning to do, the database can use a harmless lock on the currency table. Many people can therefore take a look at the same currency at the same time without blocking each other while the account table is still safely protected.
Concurrency is everything
Keep in mind: If you only got a single CPU, concurrency will be a problem. Therefore is essential to do things in a way that many CPUs can do their share of work at the very same time.
Reader Comments (5)
Awesome!!
Small correction: "At this point PostgreSQL has idea..." should read "...has no idea...".
Thanks for the interesting post. I found this somewhat facetiously mentioned (because the features enhance not just locking but also "lock-in" :-) in Ivan Pepelnjak's blog on ipspace.net.
@Simon This feature is also available in Oracle, so calling it lock-in is a stretch :)
@Simon @Jarred it is also available in MS SQL Server
Assuming a lot about PostgreSQL you may have to be careful with such queries if you apply an order to your select. I.e. select myrecord from my table order by oldestone
In row locking databases this can cause the database engine to lock all pages needed to ensure the order. Thus any other readers that come by using something similar to a READPAST hint will pass all locked rows, and appear to the client software as no records found, when in fact there are records, they're just all locked because the order by to the data, and the database had to ensure that order through the transaction.
On Microsoft SQL Server Top 1 can cause this as well as it implies an order to the underlying result. Not sure what PostgreSQL does with the limit 1.