PostgreSQL Cascading: Updates and Deletes

So something nice about a real database is cascading values, most commonly used to deletion, but you can use them for updates as well.  Let me give you a scenario: I track groups of data from my clients by a sub_id field.  As they add groups, their ids aren’t in a range.  If I wanted to consolidate them, I could update the sub_id (somewhere that won’t be stomped on by the sequence), but what about historical data.  Baring a cascade rule, PostgresSQL will stop the update.  A non-relational database like MySQL will just leave orphan foreign keys.  If you set Cascading, they come along for the ride.

View the following example:

CREATE TABLE A (a_id SERIAL PRIMARY KEY, letter char);
CREATE TABLE B (b_id SERIAL PRIMARY KEY,
a_id integer REFERENCES A(a_id) ON DELETE CASCADE ON UPDATE CASCADE,
letter char);

INSERT INTO A (letter) VALUES ('a'), ('b'), ('c'), ('d'), ('e');

INSERT INTO B(a_id, letter) VALUES (1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e');

SELECT * FROM A;
a_id | letter
------+--------
1 | a
2 | b
3 | c
4 | d
5 | e

SELECT * FROM B;
b_id | a_id | letter
------+------+--------
1 | 1 | a
2 | 2 | b
3 | 3 | c
4 | 4 | d
5 | 5 | e

DELETE FROM A WHERE a_id > 3;
-- We delete a few fields from the bottom table, and PostgreSQL magically cascades it to B

SELECT * FROM A;
a_id | letter
------+--------
1 | a
2 | b
3 | c

SELECT * FROM B;
b_id | a_id | letter
------+------+--------
1 | 1 | a
2 | 2 | b
3 | 3 | c

UPDATE A set a_id = a_id + 3;
-- This is the example above, renumbering the code

SELECT * FROM A;
a_id | letter
------+--------
4 | a
5 | b
6 | c

SELECT * FROM B;
b_id | a_id | letter
------+------+--------
1 | 4 | a
2 | 5 | b
3 | 6 | c

So we were able to renumber A, pass the values to B, without any updates to B. Pretty cool, huh?

Primary Keys in PostgreSQL shouldn’t be a Mystery

PostgreSQL has so much power, and people think it’s much harder to use.  You can do just about anything with Alter Table in terms of Adding Columns.  The only tricky thing I’ve found is that you can’t add a NOT NULL column without a Default if there is data, since the default would be NULL.  So you can allow Nulls, fill in your data, then tag it Not Null, or give it a useful default and move on.  I saw another post of someone making PostgreSQL way more difficult than needed, creating a Primary Key later.  While there are plenty (or some) reasons to create a sequence manually (I had a need for unique integers for insertion into a third-party system once), there isn’t a need for something simply like a primary key.

Let’s setup our test table, you can obviously do all the inserts on one line, I was playing on an older server for testing.


test_database=# CREATE TABLE test_table (col_A varchar, col_B varchar);
CREATE TABLE
test_database=# INSERT INTO test_table(col_a, col_B) VALUES ('a','A');
INSERT 0 1
test_database=# INSERT INTO test_table(col_a, col_B) VALUES ('b', 'B');
INSERT 0 1
test_database=# INSERT INTO test_table(col_a, col_B) VALUES ('c', 'C');
INSERT 0 1

Now we want to add a Primary Key:

test_database=# ALTER TABLE test_table ADD COLUMN test_id SERIAL PRIMARY KEY;
NOTICE: ALTER TABLE will create implicit sequence "test_table_test_id_seq" for serial column "test_table.test_id"
NOTICE: ALTER TABLE / ADD PRIMARY KEY will create implicit index "test_table_pkey" for table "test_table"
ALTER TABLE
test_database=# SELECT * from test_table;
col_a | col_b | test_id
-------+-------+---------
a | A | 1
b | B | 2
c | C | 3
(3 rows)
test_database=# \d test_table
Table "public.test_table"
Column | Type | Modifiers
---------+-------------------+--------------------------------------------------------------
col_a | character varying |
col_b | character varying |
test_id | integer | not null default nextval('test_table_test_id_seq'::regclass)
Indexes:
"test_table_pkey" PRIMARY KEY, btree (test_id)

Look, one line, added a new column test_id, with the sequence, tagged as a primary key.  Not so hard?  Why isn’t that an obvious example under Alter Table, given that it’s something that one might want to do?

PostgreSQL is Powerful, Confuses Newbies

Looking at the list of PostgreSQL blog entries, I saw this one,
Add auto increment column in PostgreSQL, and realized why my database of choice isn’t so popular… it confuses people.  Back in the old days, add an auto-numering field to PostgreSQL as a Primary Key was hard, you had to create the sequence and set the default.  This was a common operation, since a numeric Primary Key is often easier than looking for a natural key, and even when we have a natural key, it’s usually easier to index an integer than a string if we are going to use it as a foreign key somewhere.  The SERIAL “type” in PostgreSQL is the same as auto_increment, but it’s implemented as TYPE integer, attached SEQUENCE, and default value.

Example:
CREATE TABLE names_table (name_id SERIAL PRIMARY KEY, name varchar);

This gives you an PRIMARY KEY name_id, that will be auto incremented, and a string column name.  I use these all the time, and the PRIMARY KEY shortcut is an easy way to make a single column the primary key.  You can always use a primary key directive for composite keys in mapping tables and similar constructs.

I find it easy, because I know what a SERIAL is, but given that EVERY newcomer to PostgreSQL is looking for this feature, wouldn’t it make sense to highlight it in the documentation?  Perhaps in the examples given?  In addition, since auto increment is so common, why not support it?  SERIAL is syntactic sugar anyway, why not have it accept the MS SQL/MySQL terminology, you can throw off a notice so people learn the preferred one, but why push off testers?

MySQL Everywhere from a PostgreSQL Junky

Almost 10 years ago, we started working with Open Source databases, and looked at MySQL and PostgreSQL.  Back then, the feature list was night and day, MySQL was fast, lightning fast, on quick reads, and extremely slow on complicated joins.  PostgreSQL was much slower on the simple reads, but the referential integrity features, like delete cascades made the writes/deletes/updates much faster.  Most of the commentary at the time focused on transactions, but PostgreSQL focused on correctness.

Let me give you an example.  Assume I have my website products put into categories for a custom product display system (this was a common problem for someone building SEO sites for products back in 2001-2004).  If I wanted my products to exist in multiple categories, I created an intermediate mapping table, and everyone that uses a ORM system is probably nodding their head right now.  But here was the neat thing, in PostgreSQL, I just set the mapping table’s references to the Ids as NOT NULL ON DELETE CASCADE.  Now, if I delete a category, I don’t have to delete everything from the mapping table, it automatically deletes the entry whenever the category is deleted (and same thing for a product).

Now, the MySQL guys are thinking, lazy programmer, just delete from the mapping table when you delete, and we keep our faster database, or use the slow relational table systems if you are lazy.  That’s all correct, but assume I’m updating this 2001 site to support a new product/category tagging feature.  I want it fast, so I create a table of tags (with some descriptions for my SEO purposes), and two (or more) mapping tables.  Now on my PostgreSQL website, no big deal, I just make these maps ON DELETE CASCADE, and when I delete a category, it deletes the tags as well.  On the MySQL site, I have to dig into the code where I deleted the category, and update it to delete the tags.  This is fine if I built it, but what if that chunk was built by my summer intern who was learning programming at the time, now it’s a pile of spaghetti code, it works, but it’s ugly, and now I have to fix it.

Now, in 2009, MySQL has tables with referential integrity, and PostgreSQL is way faster.  I have a library of pl/pgsql code I have written over the years for PostgreSQL, some of which I updated for the new features in Postgresql 8.x, some of which still dates back to the limitations of Postgresql 7.3.  I have SEO efficient code so I never have to worry about HTML friendly naming schemes for URLs, and other things.  Meanwhile, MySQL has added cool features for the open source world of quick and dirty code, and PostgreSQL has tried to aim to please DBAs with awesome tuning parameters.  As a result, their is a wealth of awesome Open Source code developed in the past 8 years, but it’s all on MySQL.  Even when it’s ported to PostgreSQL, it’s using PostgreSQL as a fake MySQL, as in we don’t do ON DELETE CASCADE, we still delete from the mapping table.

And PostgresSQL fixed the speed problems, 8.3 is FAST, and the new 8.4 is probably FASTER, but we don’t take advantage of it because our code is all on MySQL.  Just another reminder that Worse Really Is Better!