Vacuum

Vacuum is Postgres’s GC to mark space for reuse, reclaim space, recycle transaction ID, update statistics for query planner. Postgres has built-in Autovacuum Daemon that we 100% want to turn on: See Routine Vacuuming. It is I/O intensive operations.

The autovacuum process takes care of several maintenance chores inside your database that you really need. If you think you need to turn regular vacuuming off because it's taking too much time or resources, that means you're doing it wrong. The answer to almost all vacuuming problems is to vacuum more often, not less. So that each individual vacuum operation has less to clean up.

—— Tuning Your PostgreSQL Server

VACUUM marks the disk space reference that this is available to use again, it did not free the space back to OS. VACUUM FULL will get the disk space back but in order to do so, it needs to lock the table.

The visibility map is bitmap with two bits (all-visible and all-frozen). Vaccum will skip all-frozen and only vacuum on all-visible.

VACUUM by default process all tables.
VACUUM ANALYZE runs VACUUM then ANALYZE.
VACUUM FULL rewrites the entire table into a new disk file with no extra space, unused space will return to the operating system. This form is VERY SLOW and requires an access exclusive lock on each table while it is being processed.

These are the things you want to monitor in order to make a data-informed decision:

Good thing about Vacuum is when you changed the configuration, you do not need to restart the database.

Imagine Vacuum is a Cleaner in the airport (Database) that has many customers (IO operations) going around. The Cleaner’s job is to keep the airport tidy and clean. But Cleaner cannot always do vacuum, because the airport has many customers go around to do important things. So when the cleaner does vacuum work, it keeps track of the cost of each vacuum work item.

When the airport is shut down, you can do a full clean up VACUUM FULL but the airport never shuts down and it would take 100% Disk IO. Cleaner is like doing this micro clean up here and there while the airport (Database) is still functional.

Formula that triggers auto vacuum:

autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * number_of_tuples.

There is a window (caused by pgstat delay) on which a vacuum worker may choose a table that was already vacuumed; This is a bug in the current design. Source