Vacuum is Postgres’s GC to mark space for reuse, reclaim space, recycle transaction ID, update statistics for query planner. Postgres has built-in Autovacuum Daemon that we 100% want to turn on: See Routine Vacuuming. It is I/O intensive operations.
The autovacuum process takes care of several maintenance chores inside your database that you really need. If you think you need to turn regular vacuuming off because it's taking too much time or resources, that means you're doing it wrong. The answer to almost all vacuuming problems is to vacuum more often, not less. So that each individual vacuum operation has less to clean up.
VACUUM marks the disk space reference that this is available to use again, it did not free the space back to OS.
VACUUM FULL will get the disk space back but in order to do so, it needs to lock the table.
The visibility map is bitmap with two bits (all-visible and all-frozen). Vaccum will skip all-frozen and only vacuum on all-visible.
VACUUM by default process all tables.
VACUUM ANALYZE runs
VACUUM FULL rewrites the entire table into a new disk file with no extra space, unused space will return to the operating system. This form is VERY SLOW and requires an access exclusive lock on each table while it is being processed.
VACUUM FULLrequires exclusive lock on the table it is working on
VACUUMcreates a substantial amount of I/O traffic, which can cause poor performance for other active sessions.
TRUNCATEremoves the entire content of the table immediately, without requiring a subsequent
VACUUMFULL to reclaim the now-unused disk space. But this violates MVCC.
VACUUMevery table in every database at least once every 2 Billion transactions to avoid Transaction ID Wraparound
When will the Autovacuum Daemon vacuum?
vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
When will the Autovacuum Daemon analyze?
analyze threshold = analyze base threshold + analyze scale factor * number of tuples
These are the things you want to monitor in order to make a data-informed decision:
- Processes running Auto Vacuum
- Auto Vacuum workers per table
- Active/Idle Transaction Connections
- CPU usage
- Disk Read
- Disk Pending Write
- Disk I/O time
- Disk Free Space
- Memory: shared buffer cache
- Row: Tuple
Good thing about Vacuum is when you changed the configuration, you do not need to restart the database.
Imagine Vacuum is a Cleaner in the airport (Database) that has many customers (IO operations) going around. The Cleaner’s job is to keep the airport tidy and clean. But Cleaner cannot always do vacuum, because the airport has many customers go around to do important things. So when the cleaner does vacuum work, it keeps track of the cost of each vacuum work item.
When the airport is shut down, you can do a full clean up
VACUUM FULL but the airport never shuts down and it would take 100% Disk IO. Cleaner is like doing this micro clean up here and there while the airport (Database) is still functional.
vacuum_cost_limit— Defaults to 200. The total points of vacuum work before Cleaner takes a break. So the customers (IO operations) can use the airport (Disk).
vacuum_cost_page_hit— Defaults to 1, When Cleaner vacuums a row in memory (cache hit), it costs 1 point.
vacuum_cost_page_miss— Defaults to 10, When Cleaner vacuums a row found in the disk (cache miss), it costs 10 points.
vacuum_cost_page_dirty— Defaults to 20, When Postgres vacuums a page that was clean, when Postgres vacuums it becomes dirty and need to make it clean again.
vacuum_cost_delay— The break time for Cleaner when her vacuum work points exceeded
vacuum_cost_limit. Defaults to 0, which never lets Cleaner take a break. This value should be set a multiple of 10.
Formula that triggers auto vacuum:
autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor * number_of_tuples.
maintenance_work_mem— Makes autovacuum more aggressive.
There is a window (caused by pgstat delay) on which a vacuum worker may choose a table that was already vacuumed; This is a bug in the current design. Source