Keeping Postgres tidy with partitioning
Let’s assume we have some historical data, such as logs, jobs, actions, events, metrics or something else, that stored in the database.
=# select pg_size_pretty(pg_relation_size('history_log'));
pg_size_pretty
—————-
155 GB
=# select count(*) from history_log;
count
————
2102342910
At some point, we decide to clean old events.
=# delete from history_log where updated_at < '2018-11-01';
DELETE 1885782465
Time: 327220.719 ms (12:05.221)
The query would take twelve minutes to complete. However, during this action there is a less noticeable process that takes place – query would generate certain amount of WAL that will then need to be transferred into all standbys.
Ok, let’s check how many rows there would be in the table.
=# select count(*) from history_log;
count
———–
216560445
=# select 100 * 1885782465::bigint / 2102342910;
?column?
———-
89
It seems we deleted something around of 89% of the table, but let’s check its size.
=# select pg_size_pretty(pg_relation_size('history_log'));
pg_size_pretty
—————-
155 GB
Huh, the size hasn’t been changed?!
The thing is, Postgres never performs real deletion. It just marks rows as removed. Later on, space occupied by these „removed“ rows will be cleared by vacuum and the available space can again be used for new rows, however, this space still belongs to a table. In some rare circumstances, table can be partially truncated and the free space can be returned to the file system.
Using partitioning for storing historical data can work wonders. It would allow us to drop old data quickly, without overhead related to WAL generation, so it would immediately free up space.