Related configs and its default value:
seq_page_cost 1.0
cpu_tuple_cost 0.01
Given posts table:
\d posts
Table "public.posts"
Column | Type | Collation | Nullable | Default
-------------+-----------------------------+-----------+----------+-----------------------------------
id | integer | | not null | nextval('posts_id_seq'::regclass)
title | character varying | | not null |
body | text | | not null |
created_at | timestamp without time zone | | not null |
updated_at | timestamp without time zone | | not null |
Indexes:
"posts_pkey" PRIMARY KEY, btree (id)
EXPLAIN (ANALYZE ON, BUFFERS ON)
SELECT * FROM posts;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Seq Scan on posts (cost=0.00..1.21 rows=21 width=319) (actual time=0.006..0.009 rows=21 loops=1)
Buffers: shared hit=1
Planning Time: 0.042 ms
Execution Time: 0.025 ms
SELECT
relpages,
current_setting('seq_page_cost') AS seq_page_cost,
relpages *
current_setting('seq_page_cost')::decimal AS page_cost,
reltuples,
current_setting('cpu_tuple_cost') AS cpu_tuple_cost,
reltuples *
current_setting('cpu_tuple_cost')::decimal AS tuple_cost
FROM pg_class
WHERE relname='posts';
relpages | seq_page_cost | page_cost | reltuples | cpu_tuple_cost | tuple_cost
----------+---------------+-----------+-----------+----------------+------------
1 | 1 | 1 | 21 | 0.01 | 0.21
Seq Scan Cost = relpages * seq_page_cost + reltuples * cpu_tuple_cost
= page_cost + tuple_cost
= 1 * 1 + 21 * 0.01
= 1 + 0.21
= 1.21
CREATE TABLE example_users(id integer, name character varying(255), created_at timestamp without time zone default clock_timestamp());
-- Insert 100M users of id 1 and name test_1...
INSERT INTO example_users(id, name)
SELECT n, 'test_'||n FROM generate_series(1, 100000000) n;
The table now is 5.6GB big:
SELECT pg_size_pretty(pg_relation_size('example_users')) as size;
size
---------
5737 MB
Since PostsgreSQL 10, great parallel seq scan support, let’s take a look. First, let’s see how fast when we turn off parallel seq scan:
SET max_parallel_workers_per_gather = 0;
EXPLAIN ANALYZE SELECT * from example_users where name = 'test_1';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
Seq Scan on example_users (cost=0.00..1984425.90 rows=1 width=25) (actual time=1471.182..7638.921 rows=1 loops=1)
Filter: ((name)::text = 'test_1'::text)
Rows Removed by Filter: 99999999
Planning Time: 0.039 ms
Execution Time: 7638.933 ms
Let’s take a look at the costs:
SELECT
relpages,
current_setting('seq_page_cost') AS seq_page_cost,
relpages *
current_setting('seq_page_cost')::decimal AS page_cost,
reltuples,
current_setting('cpu_tuple_cost') AS cpu_tuple_cost,
reltuples *
current_setting('cpu_tuple_cost')::decimal AS tuple_cost
FROM pg_class
WHERE relname = 'example_users';
relpages | seq_page_cost | page_cost | reltuples | cpu_tuple_cost | tuple_cost
----------+---------------+-----------+---------------+----------------+------------
734311 | 1 | 734311 | 1.0000919e+08 | 0.01 | 1000091.92
Seq Scan Cost = relpages * seq_page_cost + reltuples * cpu_tuple_cost
= page_cost + tuple_cost
= 734311 * 1 + 100009190 * 0.01
= 734311 + 1000091.90
= 1734402.9
SHOW max_worker_processes;
SHOW max_parallel_workers_per_gather;
SHOW max_parallel_workers;
SHOW parallel_tuple_cost;
SHOW parallel_setup_cost;
SHOW min_parallel_table_scan_size;
SHOW min_parallel_index_scan_size;
Default values (PostgreSQL 12.3):
SHOW max_worker_processes;
max_worker_processes
----------------------
8
SHOW max_parallel_workers_per_gather;
max_parallel_workers_per_gather
---------------------------------
2
SHOW max_parallel_workers;
max_parallel_workers
----------------------
8
SHOW parallel_tuple_cost;
parallel_tuple_cost
---------------------
0.1
SHOW parallel_setup_cost;
parallel_setup_cost
---------------------
1000
SHOW min_parallel_table_scan_size;
min_parallel_table_scan_size
------------------------------
8MB
SHOW min_parallel_index_scan_size;
min_parallel_index_scan_size
------------------------------
512kB
Let’s see how Postgres 12.3 finds user with name test_1
with default’s max_parallel_workers_per_gather
sets to 2
SET max_parallel_workers_per_gather = 2;
EXPLAIN ANALYZE SELECT * from example_users where name = 'test_1';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..1256192.31 rows=1 width=25) (actual time=1800.358..5981.957 rows=1 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on example_users (cost=0.00..1255192.21 rows=1 width=25) (actual time=4383.325..5676.442 rows=0 loops=3)
Filter: ((name)::text = 'test_1'::text)
Rows Removed by Filter: 33333333
Planning Time: 0.041 ms
Execution Time: 5981.991 ms
About 20-30% faster than parallel.
Now set to use at most 8 workers because I’m on a 8-core MacBook Pro:
SET max_parallel_workers_per_gather = 8;
It only uses 6 workers, but we are now ~ 5x faster!!
EXPLAIN ANALYZE SELECT * from example_users where name = 'test_1';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..943663.58 rows=1 width=25) (actual time=1214.798..1219.644 rows=1 loops=1)
Workers Planned: 6
Workers Launched: 6
-> Parallel Seq Scan on example_users (cost=0.00..942663.48 rows=1 width=25) (actual time=1070.600..1208.928 rows=0 loops=7)
Filter: ((name)::text = 'test_1'::text)
Rows Removed by Filter: 14285714
Planning Time: 0.040 ms
Execution Time: 1219.660 ms