The autovacuum process takes care of several maintenance chores inside your database that you really need. If you think you need to turn regular vacuuming off because it's taking too much time or resources, that means you're doing it wrong. The answer to almost all vacuuming problems is to vacuum more often, not less. So that each individual vacuum operation has less to clean up.
VACUUM marks the disk space reference that this is available to use again, it did not free the space back to OS.
VACUUM FULL will get the disk space back but in order to do so, it needs to lock the table.
VACUUM FULLrequires exclusive lock on the table it is working on
VACUUMcreates a substantial amount of I/O traffic, which can cause poor performance for other active sessions.
TRUNCATEremoves the entire content of the table immediately, without requiring a subsequent
VACUUMFULL to reclaim the now-unused disk space. But this violates MVCC.
VACUUMevery table in every database at least once every 2 Billion transactions to avoid Transaction ID Wraparound
When will the Autovacuum Daemon vacuum?
vacuum threshold = vacuum base threshold + vacuum scale factor * number of tuples
When will the Autovacuum Daemon analyze?
analyze threshold = analyze base threshold + analyze scale factor * number of tuples
These are the things you want to monitor in order to make a data-informed decision:
- Processes running Auto Vacuum
- Auto Vacuum workers per table
- Active/Idle Transaction Connections
- CPU usage
- Disk Read
- Disk Pending Write
- Disk I/O time
- Disk Free Space
- Memory: shared buffer cache
- Row: Tuple
Understanding Vacuum Configs
Good thing about Vacuum is when you changed the configuration, you do not need to restart the database.
Imagine Vacuum is a Cleaner in the airport (Database) that has many customers (IO operations) going around. The Cleaner’s job is to keep the airport tidy and clean. But Cleaner cannot always do vacuum, because the airport has many customers go around to do important things. So when the cleaner does vacuum work, it keeps track of the cost of each vacuum work item.
When the airport is shut down, you can do a full clean up
VACUUM FULL but the airport never shuts down and it would take 100% Disk IO. Cleaner is like doing this micro clean up here and there while the airport (Database) is still functional.
vacuum_cost_limit— Defaults to 200. The total points of vacuum work before Cleaner takes a break. So the customers (IO operations) can use the airport (Disk).
vacuum_cost_page_hit— Defaults to 1, When Cleaner vacuums a row in memory (cache hit), it costs 1 point.
vacuum_cost_page_miss— Defaults to 10, When Cleaner vacuums a row found in the disk (cache miss), it costs 10 points.
vacuum_cost_page_dirty— Defaults to 20, When Postgres vacuums a page that was clean, when Postgres vacuums it becomes dirty and need to make it clean again.
vacuum_cost_delay— The break time for Cleaner when her vacuum work points exceeded
vacuum_cost_limit. Defaults to 0, which never lets Cleaner take a break. This value should be set a multiple of 10.
Auto Vacuum configs
There is a window (caused by pgstat delay) on which a vacuum worker may choose a table that was already vacuumed; This is a bug in the current design. Source