PostgreSQLvsMySQL

Webs: Faemalia -:- Greatprawn -:- Playground -:- Technical -:- Tweak
Technical Web Sections: Register -:- Users -:- Changes -:- Index -:- Search -:- Statistics

Please take care in using this page for information...

This page has some parts that are out-of-date. I've given it a once over in late December, 2004, but my knowledge of PostgreSQL is getting more and more distant. I was initially a PostgreSQL fan, but now MySQL is really shaping up. Right now, if you're willing to work around its shortcomings, you can build some amazing database clusters with it.

I presently am a professional MySQL DBA, so I do have some deep knowledge of MySQL's numerous problems. Also, I have recommended to a DBA friend that they use PostgreSQL and have heard the depth of problems/concerns they have.

Overall my conclusion is that PostgreSQL is better (in most shops) than MySQL due to the numerous silent ways MySQL will allow you to corrupt your data, but MySQL has more features implemented at this time and workarounds for many of its lack of features.

My recommendations at this time would be:

Small Installations: PostgreSQL.
Need replication?: MySQL (but memorise the MySQL Gotchas page!!!).
Medium to Large Installations: MySQL, but get the support option.

No matter what, if you use MySQL, memorise the MySQL Gotchas page. You must, must, must know the ways MySQL will silently corrupt your data.

Introduction

There have been a number of debates about which is better over the years. MySQL? PostgreSQL? Some say MySQL is faster, others say PostgreSQL has transaction logging. If you do a search on the vast InterWeb about it now, that's the sort of information you'll get. Here is an example: OpenACS MySQL Page.

So MySQL has no transactions, and PostgreSQL is slow? But MySQL has had transaction logging for a long time and, in fact, its replication is based on the transaction logging. PostgreSQL has been on-par with MySQL in speed performance for years.

The purpose of this page is, then, to be at least moderately up-to-date. Please edit this page and make corrections and/or orthagonally-correct factually-backed opinions. Note that this page talks nothing of views or triggers, for example.

So, back to the question. Which is better? As with everything more complicated than an amoeba, the answer is: depends on what you define as better.

As an enterprise-level DBA and database architect, I'm concerned with which is better on a certain set of criteria:

  1. Must be a true transactional database supporting atomicity, consistency, isolation, and durability (ACID).
  2. Must support online consistent backups while the database remains in a read-write state.
  3. Must support replication for purposes of scalability and/or high availability.
  4. Must support maximum client concurrency.
  5. Must have commercial technical support of at least moderate usefulness.
  6. Must not have design flaws disallowing speedy data insert, update, delete, and select.
  7. Must have robust and complete logging of abnormal end conditions.

It is important to note what are NOT criteria in deciding the better database. Although we can argue these points here if we so desire, they are less-than-relevent in keeping an enterprise-class database functional.

  1. Distribution license. Is GPL better, or BSD? For the record, when I write software, I prefer GPL. But as for the MySQL vs. PostgreSQL debate, I don't see how it's relevant. Update December 2004. There does seem to be some weight behind the argument that developers who work for free prefer the GPL license, since it puts some weight behind the "If you use my work, which I've laboured to give you for free, you should give a little something back" sentiment. BSD license really allows evil corporations to take, take, take without returning anything of value.
  2. Community size. Well, admittedly, the SAP/DB and Firebird communities seem relatively miniscule, but once your community grows to a certain critical mass, does it matter? Sure, the MySQL community has 7 quadrillion members, but PostgreSQL community doesn't seem small... rather, comfortably-sized.

ACID: True Transactional Database

As of May, 2003, both PostgreSQL and MySQL support these things. PostgreSQL uses transactions by default, and MySQL supports both a transaction-engine based table type (InnoDB, which apparently went production in March, 2003) and transaction-free table type (MyISAM).

The InnoDB support is a little young, and there are more reports on the mailing list of database corruption than makes me comfortable, but overall there seems to be nothing lacking in MySQL except a little more battle-testing. Update December 2004. InnoDB is pretty battle tested at this point.

Online Consistent Backups

An enterprise RDBMS must be able to do online backups. And by online, I mean the database should be in read-write state, and the resultant backup should be consistent. To the novice database developer, this might seem like an impossible task, but rest assured, it is possible and is implemented in many RDBMSs. Why is this important? It's only important in 24x7 environments where there is no database "down time" for doing backups.

In MySQL, you must create a replication slave, and use it to create backups. This has the disadvantage of requiring more hardware (double the storage). The advantage is that the master does not even notice when a backup is occurring, whereas traditionally you needed to run backups during a "low period" in your database's usage.

PostgreSQL does have online (read/write) consistent backups implemented.

A nice feature (and one which is arguably a requirement of an enterprise-class RDBMS) is point-in-time recovery of a crashed database. As of 19 January 2005, both MySQL and PostgreSQL have this feature.

Another random data point. From the MySQL 4.1.0 manual (the most recent I have):

In addition to taking the binary backups described above, you should also regularly take dumps of your tables with `mysqldump'. The reason to this is that a binary file may be corrupted without you noticing it. Dumped tables are stored into text files which are human-readable and much simpler than database binary files. Seeing table corruption from dumped files is easier, and since their format is simpler, the chance for serious data corruption in them is smaller.

Scary.

Replication

This page, the gborg Postgres-R page, claims replication was to be merged into PostgreSQL 7.2, however, Tom Lane, a core developer, said that it won't be merged until 7.4 or 7.5. There is a for-pay add-on to the product by Postgres, Inc. Note that if you buy commercial support for PostgreSQL at pgsql dot com that you get a free PostgreSQL replication license (called eRserver).

Here is another PostgreSQL replication solution: USOGres, which was mentioned on the PostgreSQL administrator's list.

There are at least two free replication solutions for PostgreSQL, and there are reports of administrators successfully implementing them.

MySQL claims:

Replication -- Gamma Large server clusters using replication are in production use, with good results. Work on enhanced replication features is continuing in MySQL 4.x.

The replication in MySQL is one-way replication (master-->slaves).

Update December 2004. My personal experience is MySQL replication works very well. The fact that you need to recompile PostgreSQL with a weird third party patch is pretty bad.

Technical Support

Both PostgreSQL and MySQL offer commercial technical support. I've never called on PostgreSQL tech support, however, I have friends who run PostgreSQL in production, and they say the pgsql support is nice. I've worked with MySQL tech support now for about 2 years, and find it 500-1000% better than the best support I ever got from any of Microsoft, Sybase, Oracle, or IBM. Given that both products come with source code, this is a HUGE advantage over the proprietary equivalents that cannot be understated.

The unofficial community support is hit-and-miss. Sometimes you can get a useless answer (I myself have been the perpetrator of this problem at least once) and sometimes you can get a useful answer. Nick Fankhauser says of PostgreSQL administrator's mailing list:

PostgreSQL's Admin mailing list has consistently had a faster response with useful answers than any commercial counterpart. In particular, I fondly recall the morning that I stupidly deleted my WAL files at 2AM and received the info I needed to get my system repaired by 6AM. Can't beat that anywhere, so who even needs the commercial support? Source Code is nice, but helpful and friendly developers are better yet.

I personally do not think community support is a complete replacement for commercial support, but my own experience mirrors Nick's. Community support is generally better than commercial support unless the problem is boring and tedious.

MySQL's mailing lists are administered such that even the most newbie requests or veteran problems appear on the same mailing list. It would be nice if the MySQL mailing list was broken into two or three smaller versions, similar to the PostgreSQL mailing lists which include lists such as: Administrators, Hackers, SQL questions, etc.

Benchmarking

MySQL developers make a pretty bold claim, that PostgreSQL dies horribly when running a benchmark they wrote. That they tried to contact PostgreSQL developers, but were ignored. I somehow doubt this is the whole story. Especially in light of the user comment at the bottom about the OSDB test.

The word on the street is MySQL and PostgreSQL are pretty even on speed. MySQL, of course, says they're faster.

Some say that in datawarehousing, the MyISAM (non-logging) table support is nice for doing table loads, but it seems to me that the PostgreSQL "COPY table FROM" syntax largely negates this MySQL advantage... how much logging goes on with COPY table FROM? The answer, from the PostgreSQL administrators' mailing list is that there's very little overhead, less than starting a transaction, doing the inserts, then committing the transaction.

In MySQL, all "alter table" statements, including adding or dropping indexes, rebuild the entire table from scratch. This means there really is no such thing as an online index rebuild or add, for example, because the table is locked to disallow any changes to it for the entire time the table is being rebuilt. Further, this requires double the space of the table to be available for the alter to take place.

Pretty poor form for MySQL here. All the preceding is true as of December 2004.

Robust, Complete Logging

If the client application reports that an error occurred during processing, there should be something in the server log that explains the server's opinion of what happened.

This is something that I, as a production DBA, have taken for granted because it's so EASY for an RDBMS to implement it that all major RDBMSs I've ever used (Sybase, Oracle, DB2) allowed for it.

MySQL, unfortunately, appears to be a bit lax in this regard. One of our queries aborted with an "Error 2013" meaning somehow the MySQL daemon killed the connection during a query.

There is nothing in the error log about this, and MySQL support is curiously silent whenever I ask how to log these types of errors.

Unfortunately, I have no experience with PostgreSQL to say whether or not it suffers from this fatal flaw. Please update this page if you have concrete experience on these matters.

Update December 2004. This still rings very true with MySQL. I don't think more than a couple days ever go by that MySQL doesn't do something mysterious, and there is absolutely nothing in the logs about it. Fortunately there are workarounds, but if you don't have the official tech support, you might find these problems rather too mysterious. Some might argue that MySQL leaves the lack of robust logging and error reporting in place to encourage their monetary income.

Known long-term Stability

When you're setting up your workstation at home, or playing games, you often want the latest software available. You even try to get software before it's actually released (beta).

But when you're depending on a piece of software to keep track of millions of dollars worth of data and billions of transactions a day, you want software that has been around for a long time and has been tested well.

Curiously, the "subscription" model of software of the proprietary vendors (think Microsoft and Oracle) works against this. If Microsoft and Oracle provide you with a stable piece of software that works all the time and has all its kinks worked out, then you won't need to give them money anymore.

PostgreSQL and MySQL should both, being open source and licenced BSD and GPL respectively, be more stable. But MySQL's past is tainted by the original developers who, for years, seemed to almost shun stability for speed. Even now (February, 2005), you can still see high profile cases like this, from the Wikipedia:

After some minutes, the switch and most of our machines had rebooted. Some of our servers required additional work to get up, and a few may still be sitting there dead but can be worked around.

The sticky point is the database servers, where all the important stuff is. Although we use MySQL's transactional InnoDB tables, they can still sometimes be left in an unrecoverable state. Attempting to bring up the master database and one of the slaves immediately after the downtime showed corruption in parts of the database. We're currently running full backups of the raw data on two other database slave servers prior to attempting recovery on them (recovery alters the data).

Update 19:20 PST: We have at least one database server with intact data. When we have a second up and running, we'll be able to put the site back online in read-only mode as we continue.

Generally it's well-known that PostgreSQL is extremely careful with your data. If you posted a story like this on their mailing lists, they'd probably be in an uproar, tearing apart the code, looking for the problem.

Conclusion

MySQL has great features, is speedy, has decent tech support, and allows to build projects for 1/100th the cost of the "big boys." Friendster, TellMe?, Google, Yahoo, Technorati, and a litany of other Tier1 companies use it to build massive database clusters with amazing throughput.

PostgreSQL is a lesser-known database, most likely because there isn't a single company out there feeding press releases to the press. But it is known to be used in, for example, the .ORG domain registry. I also know it's being used in some other smaller production environments. I wouldn't hesitate to check PostgreSQL for adequacy for any new project I was building specifications for.

--
PhiloVivero
Updated - February 2005


Edit -:- Attach -:- Ref-By -:- Printable -:- More