Recently while abusing Postgres to do things it’s not supposed to do (I may go into this particular hack at a later date), I ended up trying to convert an existing primary key index to a covering index to make sure a certain column was always easily accessible since most queries only returned this value. This kind of pattern is great for lookup-like tables where the majority of the values are pretty small, you can get a Redis-like experience out of Postgres if you really push it.
Enough intro, here’s the code, split into two migrations:
--add-covering-idx-for-redis-data-value (up) -- Create new index idx, covering value CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS <your table>_pkey_new ON <your table> USING BTREE (key) INCLUDE (value);
--swap-pkey-with-covering-idx (up) BEGIN; -- Drop & recreate existing pkey constraint ALTER TABLE <your table> DROP CONSTRAINT <your table>_pkey, ADD CONSTRAINT <your table>_pkey PRIMARY KEY USING INDEX <your table>_pkey_new;
Why two migrations? Two migrations are needed because Postgres can’t
CREATE INDEX CONCURRENTLY ... from inside a transaction.
While some people believe that only up migrations are needed, I’m a firm believer in writing down migrations (and using them/testing them before you deploy so changes can be rolled back if necessary with minimal breakage). So the next logical thing to ask is how do I undo this change? Here are those migrations going the other way (note that they need to be inverted somewhat:
--swap-pkey-with-covering-idx (down) -- Create new index idx, *not* covering value, this is equivalent -- to the auto-generated pkey idx CREATE UNIQUE INDEX CONCURRENTLY IF NOT EXISTS <your table>_pkey_new ON <your table> USING BTREE (key);
And after that (or should I say “before”?):
--add-covering-idx-for-redis-data-value (down) BEGIN; -- Drop & recreate existing pkey constraint ALTER TABLE <your table> DROP CONSTRAINT <your table>_pkey, ADD CONSTRAINT <your table>_pkey PRIMARY KEY USING INDEX <your table>_pkey_new; COMMIT;
Now I have to admit I didn’t give a world of thought to all the ways this could go wrong, but these two migrations (done in quick succession) are effective.
Postgres has a hard (to change) limit of 8191 bytes, so make sure before you use this code you’re sure that essentially all rows are below that size. The migration may work in development and staging environments, but if your staging environment is not (yes, I got bitten by this) sufficiently like your production environment, it’s possible the migration will fail there and you’ll have to roll the migrations back.
Particularly in this case, a code-first migration approach might have made it trivial to run two separate queries (and not worry about how the migration framework was running your SQL).
One decent way to do this is based on a great SO answer (query reproduced below):
SELECT sum(pg_column_size(t.*)) as filesize, count(*) as filerow FROM TABLE_NAME as t;
You can do some filtering on the sizes to get a smaller list of the problematic rows and maybe do some specific things to combat it, but in general if you’re at risk for having a row (or the value you want to store) go over 8kb, you probably want to avoid this approach all together.
I generally like when my migrations are only SQL scripts (as I do with
@slonik/migrator), but I also see lots of benefit to being able to do code things (for example updating some relevant external store or services as you make changes to the postgres side (which I do with TypeORM). True to my generally anti-ORM stance (they certainly have their uses, and no you probably shouldn’t write CRUD queries ever), I’m currently leaning in the SQL-everywhere direction for migrations.
This was a quick post, I thought it might be worth doing a quick write up on it because I did more trial and error than I was comfortable with (and looked on SO more than a real DBA would).
Hopefully someone sees this post and learns something (and avoids the max index row size footgun).