Thursday
Aug232007
Postgresql on high availability websites?

I was looking at the pingdom infrastructure matrix (http://royal.pingdom.com/royalfiles/0702_infrastructure_matrix.pdf) and I saw that no sites are using Postgresql, and then I searched through highscalability.com and saw very few mentions of postgresql. Are there any examples of high-traffic sites that use postgresql? Does anyone have any experience with it? I'm having trouble finding good, recent studies of postgres (and postgres compared w/ mysql) online.
Reader Comments (10)
I've been looking and I haven't found any either. I would love to find someone using postgresql and profile them. I was excited at first about the TypePad slides because they said they were using postgresql, but then they converted to MySQL.
It doesn't make any sense to me really. By all accounts postgresql is one of the best databases on the planet, yet I find myself always using MySQL too. Safety in numbers? Go with the hot hand? All those people can't be wrong? Lots of not quite satisfactory reasons.
I used mysql by default most of the time. In fact, our site is running mysql now, but Postgres has PostGIS, which looks really appealing. All the mapping people I've talked to have great things to say about PostGIS, so we'll take the plunge on Postgres in order to get to it. Would be nice to hear more about big sites using postgres, though...
MySql has made a virtue of performance, which isn't surprising, in my opinion, as it doesn't have many other virtues. Where data isn't critical (it would be critical for banking/accounting etc), and you've got the development muscle to bash your way through MySql's idiosyncrasies, then it may well be a good candidate for sites that prefer high performance. If that is true.
My personal strategy is to start a site with some nice language/framework/database combo, like Django (Python) and Postgres, see if it grows, then re-write for scalability in faster (or more tunable) languages/databases, while optimising the original code until switchover. Currently I'm starting a site using Rails and Postgres on CentOS 5.
Having said all that I heard some fellow say that it was MySql's replication capabilities that forced them away from Postgres.
Hi,
My company implements middleware clustering to improve availability and scalability of open source databases like PostgreSQL and MySQL. I can't name specific sites due to client confidentiality but they are out there. Many of them are running behind corporate firewalls, so you won't see them on the Net. Here's our take on the state of availability and performance with PostgreSQL.
1. Availability. There are many solutions. You can buy a shared disk and implement application failover using a cluster manager like Heartbeat or RedHat Cluster Manager. If the database host goes down, another host mounts the disk, recovers the database, and keeps chugging. You can substitute log shipping and point-in-time-recovery for the shared disk. These features are built into PostgreSQL and widely used. You can use DRBD to replicate disk contents to another host. Finally, you can use middleware approaches like Sequoia (our open source product) to replicate updates synchronously to multiple databases. If one database host goes down, traffic is automatically redirected until the failed database can be restored.
2. Performance and scaling. Somebody already raised this issue so it seems worth commenting. On small systems running transactional applications the performance of PostgreSQL and MySQL is pretty much a wash. PostgreSQL is more sensitive to good management--if you fail to run vacuum regularly on heavily updated tables the performance may degrade significantly. Real performance differences appear as you scale.
PostgreSQL linear scaling has improved remarkably over the last couple releases. I have seen numbers that show linear scaling out to 12 cores, whereas MySQL seems to falter somewhere above 4 or so. There appear to be some latching issues that are causing this. MySQL on the other hand has a stronger scale-out capability than PostgreSQL. MySQL has long provided robust, easy-to-configure replication that puts little load on the master database, runs very quickly, and supports fan-out to a hundred servers or more. For a variety of reasons, the replication capability is less important for availability than performance, where it gives MySQL a real advantage for read-intensive applications. Our product, Sequoia, uses multiple databases with automatic load balancing to scale reads. It is one of the few products that addresses this need for PostgreSQL, where it can significantly boost application performance when performing resource intensive queries.
My impression is that PostgreSQL is right at the tipping point in terms of wide use. The PostgreSQL team (led by Sun) recently posted http://www.spec.org/jAppServer2004/results/res2007q3/jAppServer2004-20070606-00065.html">SpecJAppServer results that show PostgreSQL gets about 80% of the performance of Oracle at around one third of the price. It has a full featured SQL implementation that is great for enterprise applications that need the full majesty and gloryof the SQL language. PostGIS is increasingly popular for spatial applications. I talk to or get questions from customers concerning PostGIS around once a week, and traffic seems to be increasing. Finally, you are not giving up availability by choosing PostgreSQL. There is a wide range of solutions that are constantly improving.
Robert Hodges
CTO, Continuent, Inc.
I'm running a 4 node PostgreSQL set up (replicated by Slony-I). Our site is in the process of scaling up from 0 users to over 300,000 over the past 4 years (site currently serves ~750k pages per day). PostgreSQL has been rock solid for me all the way. It is an amazing piece of software. One problem I'm running into, (and I think this is PostgreSQL's biggest weakness) is the lack of a multi-master replication system (without using a middleware system such as PgPool). I haven't hit the ceiling on one master yet, and have been able to architect around the replication delay that Slony incurs- but I can see the day coming when I will hit that ceiling. Not sure what I'll do then, probably shard out the data to multiple replication clusters.
I think the PostgreSQL community knows this is a weakness, and I know they're working on it, but I haven't heard much lately about it. If they can provide dead-simple multi-master replication (Slony works great and is incredibly powerful, but it is NOT simple) I think we'll be over the edge of the tipping point that Robert mentioned.
Another problem I've run into is finding qualified DBAs for PostgreSQL. It seems to be a much smaller community than the MySQL community and its very difficult to find a gifted PostgreSQL DBA out of the box and not one that has to be trained in PostgreSQL's idiosyncrasies.
These problems aside, I'm glad we bet on PostgreSQL 3 years ago and I'm standing by it today. The fullness of the SQL implementation, the continuing performance improvements and its ability to gracefully degrade under extreme load are more than enough reasons to keep my eyes from wandering toward other systems.
I've been waiting years to find this site! Great job!
Kurt Overberg
CTO, BzzAgent Inc.
We've been running PostgreSQL for over 2 years on our mobile community / social networking site http://www.mocospace.com. We now do over 500 million pages per month. We make extensive use of a distributed caching service to offload from the database. I believe we are one of the highest volume websites to use PostgreSQL in the world. It is awesome.
Jamie Hall,
Co-founder & CTO, MocoSpace
Hi5 scaled PostgreSQL on over fifty servers to become one of
the top twenty web sites in the world, more here http://postgresql.meetup.com/1/calendar/5808330/
http://www.hi5networks.com/blog/2007/06/postgres_users_group_meeting_a.html
I am One of three founders of Oprius which is a hosted Crm/ sales tool built on postgres, python. turbogears, and hosted on the Amazon cloud. We are just starting our scaling as our product went live in January 08. If there is anyway we car help the community with feed back as we grow let me know.
Major sites that use PostgreSQL are:
Instagram : http://instagram-engineering.tumblr.com/
Disquis: http://justcramer.com/2010/05/30/scaling-threaded-comments-on-django-at-disqus/
Unbiasly : http://www.unbiasly.com
Yahoo : http://www.yahoo.com
PostgreSQL used to be way behind on replication, but it has caught up.
At this point in time (2014), I personally think it's difficult to justify using MySQL for any new application. The only circumstance I can think of where MySQL might be advisable is if you have very simple needs (e.g. the DB is just a backend for your simple ORM data model), and you lack DBA staff and thus need to spend very little effort on database administration.
However, sooner or later, if you have a database then you usually want to start writing interesting queries against it. MySQL is just not very good for this compared to pretty much any other major RDBMS.
In terms of programming features, I think PostgreSQL has even increased its lead over MySQL over the past decade.