PostgreSQL is Powerful, Confuses Newbies

Looking at the list of PostgreSQL blog entries, I saw this one,
Add auto increment column in PostgreSQL, and realized why my database of choice isn’t so popular… it confuses people.  Back in the old days, add an auto-numering field to PostgreSQL as a Primary Key was hard, you had to create the sequence and set the default.  This was a common operation, since a numeric Primary Key is often easier than looking for a natural key, and even when we have a natural key, it’s usually easier to index an integer than a string if we are going to use it as a foreign key somewhere.  The SERIAL “type” in PostgreSQL is the same as auto_increment, but it’s implemented as TYPE integer, attached SEQUENCE, and default value.

Example:
CREATE TABLE names_table (name_id SERIAL PRIMARY KEY, name varchar);

This gives you an PRIMARY KEY name_id, that will be auto incremented, and a string column name.  I use these all the time, and the PRIMARY KEY shortcut is an easy way to make a single column the primary key.  You can always use a primary key directive for composite keys in mapping tables and similar constructs.

I find it easy, because I know what a SERIAL is, but given that EVERY newcomer to PostgreSQL is looking for this feature, wouldn’t it make sense to highlight it in the documentation?  Perhaps in the examples given?  In addition, since auto increment is so common, why not support it?  SERIAL is syntactic sugar anyway, why not have it accept the MS SQL/MySQL terminology, you can throw off a notice so people learn the preferred one, but why push off testers?

MySQL Everywhere from a PostgreSQL Junky

Almost 10 years ago, we started working with Open Source databases, and looked at MySQL and PostgreSQL.  Back then, the feature list was night and day, MySQL was fast, lightning fast, on quick reads, and extremely slow on complicated joins.  PostgreSQL was much slower on the simple reads, but the referential integrity features, like delete cascades made the writes/deletes/updates much faster.  Most of the commentary at the time focused on transactions, but PostgreSQL focused on correctness.

Let me give you an example.  Assume I have my website products put into categories for a custom product display system (this was a common problem for someone building SEO sites for products back in 2001-2004).  If I wanted my products to exist in multiple categories, I created an intermediate mapping table, and everyone that uses a ORM system is probably nodding their head right now.  But here was the neat thing, in PostgreSQL, I just set the mapping table’s references to the Ids as NOT NULL ON DELETE CASCADE.  Now, if I delete a category, I don’t have to delete everything from the mapping table, it automatically deletes the entry whenever the category is deleted (and same thing for a product).

Now, the MySQL guys are thinking, lazy programmer, just delete from the mapping table when you delete, and we keep our faster database, or use the slow relational table systems if you are lazy.  That’s all correct, but assume I’m updating this 2001 site to support a new product/category tagging feature.  I want it fast, so I create a table of tags (with some descriptions for my SEO purposes), and two (or more) mapping tables.  Now on my PostgreSQL website, no big deal, I just make these maps ON DELETE CASCADE, and when I delete a category, it deletes the tags as well.  On the MySQL site, I have to dig into the code where I deleted the category, and update it to delete the tags.  This is fine if I built it, but what if that chunk was built by my summer intern who was learning programming at the time, now it’s a pile of spaghetti code, it works, but it’s ugly, and now I have to fix it.

Now, in 2009, MySQL has tables with referential integrity, and PostgreSQL is way faster.  I have a library of pl/pgsql code I have written over the years for PostgreSQL, some of which I updated for the new features in Postgresql 8.x, some of which still dates back to the limitations of Postgresql 7.3.  I have SEO efficient code so I never have to worry about HTML friendly naming schemes for URLs, and other things.  Meanwhile, MySQL has added cool features for the open source world of quick and dirty code, and PostgreSQL has tried to aim to please DBAs with awesome tuning parameters.  As a result, their is a wealth of awesome Open Source code developed in the past 8 years, but it’s all on MySQL.  Even when it’s ported to PostgreSQL, it’s using PostgreSQL as a fake MySQL, as in we don’t do ON DELETE CASCADE, we still delete from the mapping table.

And PostgresSQL fixed the speed problems, 8.3 is FAST, and the new 8.4 is probably FASTER, but we don’t take advantage of it because our code is all on MySQL.  Just another reminder that Worse Really Is Better!