The database was originally called Ingres, name Postgres is coming from "Post-Ingres". Read more: A Brief History of PostgreSQL.
As of 2020 Dec 25th: PostgreSQL 13.1, 12.5, 11.10, 10.15, 9.6.20, and 9.5.24 released. Read more: PostgreSQL-releases
The Can I use X for PostgreSQL: Feature Matrix.
- Range Types
- Autocompress for Text > 2000-char long
- Data types JSON, ENUM, UUID, CIDR, MACADDR
- SET parameters
- Asynchronous Commit
- Transactional DDL
CREATE INDEX CONCURRENTLY,
INSERT ... ON CONFLICT DO UPDATE
- Row-level security
- Logical Replication
- SCRAM Auth
- Multi-column statistics
- Covering Indexes
- Hot Backup (logical and physical)
- Point in Time Recovery
- LATERAL JOIN
- Full Text Search
Snowball text parser, GIN - Inverted Search index, or GIST - high concurrency.
- GIS (Geographical Information System)
They roughly mean the same thing:
"page" in disk space. Defaults to 8 kB. Minimum unit to read/write data
"buffer" in memory
Relation = Table, Index
Tuple = Row
Project(ion) = Select
XID = transaction ID, 32-bit integer = 4096M XIDs = 4 billion XIDs
OID = Unique Object ID (of row, table, database), hidden by default
SELECT oid, relname FROM pg_class where relname = 'table';
Once you got the oid, you can find all the locks by process from that table
SELECT locktype, database, pid, relation, mode FROM pg_locks WHERE relation = 123456;
OLTP On-Line Transaction Processing
OLAP On-Line Analytical Processing
Visibility Map—Like the index of the book
Transaction IDs. They are limited and must be reused.
VACUUMremoves references to old XIDs so that we can reuse them to not run out of XIDs. Run out of XID will be fatal data loss. VACUUM cannot remove reference of an old XID when it’s still in use.
Client programs come with Postgres
Their names are generic but actually from Postgres. Postgres called them "app", Postgres Client Applications.
Basics of PSQL
The interface to use SQL to interact with Postgres database.
psql [DBNAME [USERNAME]] psql -h -p connect to specific host and port psql -c "SELECT 1 FROM users;"
\q to quit \l \db \d table \dt+ table \sf func — Show function definition \x, \x auto \? \h CREATE TABLE \timing \watch CREATE ROLE CREATE TABLESPACE CREATE DATABASE
True is one of
False is one of
Topics to Learn
- VACUUM — VERY IMPORTANT!
- Microvacuum — On-the-fly deletion (remove dead index from B-tree pages)
- VACUUM in Memory
- Typical queries runtime
- ACID of Transaction
- Atomic—Do all or nothing
- Consistent—Database transfer from a consistent state to another consistent state
- Isolated—All transactions should not affect each other
- Durable—Changes made should stay in database, even when database has issue(s).
MVCC Multi-Version Concurrency Control (No block locking)
- dead tuple — When updated a row, a new version of row is created and inserted into the table. Old version of the row is marked as invisible for future transactions (not physically removed), then it becomes dead tuple. Read more
- Select ... FROM LATERAL (Aggregate functions are not supported)
EXPLAIN see Postgres EXPLAIN Visualizer
EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS, FORMAT JSON) SELECT * FROM POSTS
- Common Table Expressions (CTE)
WITH valid_users as ( SELECT id FROM users WHERE valid = 1 ) SELECT * FROM valid_users
BEGIN; ... END/COMMIT/ROLLBACK;
- Subquery v.s. WITH
- Bulk export with COPY
- COPY to import data fast (but no replace yet as PostgreSQL 12)
- RETURNING to decide what fields to return
INSERT INTO ... ON CONFLICT UPDATE
- Random Row
TABLESAMPLE (SYSTEM, BERNOULLI)over
ORDER BY random()for sampling data
tsm_system_rowsprobably the best for the job.
- MAX by OVERR
SELECT first_value(field) OVER ( PARTITION BY table ORDER BY field), * FROM table
- LATERAL JOIN Official doc
Reuse computations after
- Calculate similarity: fuzzystrmatch or pg_similarity
- selective aggregates
- tsvector & tsquery types are important for Full Text Search
- Window Functions
- Managing Freezing in PostgreSQL
- Terminate Postgres Process pg_terminate_backend or pg_cancel_backend
kill -9, postgres will need to recover, during recover you cannot access postgres.
- Explicit Locking
AS () ... SELECT ... FROM result;
- bloat — When a table has so many dead tuples, its size grows more than it actually needs is called bloat. When index points to dead tuples instead of tuple, it is bloat. Bloated index results in more unnecessary I/O fetches. Slowing down index lookup and scanning.
pg_repackfor table bloat
Create a temp table as target table.
COPYtarget table data to temp table. Create indexes on temp table. Then swap temp table and target table. In the process, pg_repack will free unused spaces to Operating System.
- Slony, PostGIS, JDBC
- CLOG buffers
- Reuse execution plans
- Explaining the Postgres Query Optimizer
- Incremental sorting (PostgreSQL 13)
- AWR: pg_statsinfo (by NTT)
- ASH: pgsentinel
R = S + WResponse time = Service time + Wait time
- Wait Events
Usually in this format:
<not-recursive> UNION [ALL] <recursive>
work_mem— used in ORDER BY, DISTINCT
maintenance_work_mem— used in VACUUM, REINDEX, CREATE INDEX
Added since PostgreSQL 9.6+, thriving since 10+. This is not silver bullet.
Parallel Query does use more CPU, IO, Memory. If you have spare CPU, IO, Memory to splurge, go go go!
Configs we can change for Parallel Query.
0 means disabled Parallel Query; defaults to 2 (enabled).
defaults to 8. Max number of workers for System.
defaults to 8. Max number of workers to use for Parallel Query.
max_parallel_workers set to 2, but you have
max_parallel_workers_per_gather set to 6, then you still only get 2 workers.
defaults to 1000. Cost to start parallel process.
defaults to 0.1. Cost to process a row.
defaults to 8MB. How big a table is to start using Parallel Query (bigger than 8MB).
defaults to 512kb.
Don’t enable this. This hurts OLTP.
max_worker_processes > max_parallel_workers > max_parallel_workers_per_gather
PostgreSQL query table in these ways and its parallel equivalent you can benefit when you enable Parallel Query:
- Sequential Scan
- Index Scan
- Index-only Scans — When the selected field and condition matched the indexed conditioned.
- Index Only Scan Backward
- Bitmap Index Scan (when SQL’s WHERE has
- Bitmap Heap Scan (when SQL’s WHERE has
- Parallel Sequential Scan
- Parallel Index Scan
- Parallel Index-only Scan
- Parallel Bitmap Index Scan
- Parallel Bitmap Heap Scan
Aggregate functions can also run in parallel (COUNT, SUM, MIN, MAX).
- Parallel index creation:
CREATE INDEX CONCURRENTLY
- GIN index can make
LIKEqueries run fast even if the
%is not in the end
\x auto(auto format output nicely)
\dxList all extensions
\dListing all relations
\dtListing all tables
\d usersDescribe users tables
\pset format csvOutput as CSV, delimiter as tab:
\pset csv_fieldsep '\t'
\timingShow query execution time
- Prefer jsonb over json
- Should not use PREPARE TRANSACTION
- Index: Partial Index
- If index not supports btree, use include
- PostgreSQL Wiki
- PostgreSQL Memory Components
- Understanding caching in Postgres - An in-depth guide
- Mailing List: psql-hackers
- View PostgreSQL Source Code: doxygen — tells you all callers of a given c function
- PG Casts
- PostgreSQL at low level: stay curious!
- The Internals of PostgreSQL
- Improving Postgres Connection Scalability: Snapshots
- 10 Things I Hate About PostgreSQL
- B-tree indexes being as much as 40% smaller on PostgreSQL 12.
Consultancy / Commercial Companies
- EnterpriseDB (EDB)
- 2ndQuadrant (acquired by EnterpriseDB)
- Citus Data (acquired by Microsoft)
Commercial Postgres solutions
- TimescaleDB for timeline analysis
- PipelineDB for dashboard, realtime monitoring
- CitusDB for OLAP, OLTP, large data analysis
- Crunchy Data for managed Postgres
Please see PostgreSQL derived databases for more.