Postgres: SeqScan

Related configs and its default value:

seq_page_cost 1.0
cpu_tuple_cost 0.01

Given posts table:

\d posts
                                         Table "public.posts"
   Column    |            Type             | Collation | Nullable |              Default
-------------+-----------------------------+-----------+----------+-----------------------------------
 id          | integer                     |           | not null | nextval('posts_id_seq'::regclass)
 title       | character varying           |           | not null |
 body        | text                        |           | not null |
 created_at  | timestamp without time zone |           | not null |
 updated_at  | timestamp without time zone |           | not null |
Indexes:
    "posts_pkey" PRIMARY KEY, btree (id)
EXPLAIN (ANALYZE ON, BUFFERS ON)
SELECT * FROM posts;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Seq Scan on posts  (cost=0.00..1.21 rows=21 width=319) (actual time=0.006..0.009 rows=21 loops=1)
   Buffers: shared hit=1
 Planning Time: 0.042 ms
 Execution Time: 0.025 ms

SELECT
  relpages,
  current_setting('seq_page_cost') AS seq_page_cost,
  relpages *
  current_setting('seq_page_cost')::decimal AS page_cost,
  reltuples,
  current_setting('cpu_tuple_cost') AS cpu_tuple_cost,
  reltuples *
  current_setting('cpu_tuple_cost')::decimal AS tuple_cost
FROM pg_class
WHERE relname='posts';

 relpages | seq_page_cost | page_cost | reltuples | cpu_tuple_cost | tuple_cost
----------+---------------+-----------+-----------+----------------+------------
        1 | 1             |         1 |        21 | 0.01           |       0.21
Seq Scan Cost = relpages * seq_page_cost + reltuples * cpu_tuple_cost
              = page_cost + tuple_cost
              = 1 * 1 + 21 * 0.01
              = 1 + 0.21
              = 1.21

100M data Seq Scan

CREATE TABLE example_users(id integer, name character varying(255), created_at timestamp without time zone default clock_timestamp());

-- Insert 100M users of id 1 and name test_1...
INSERT INTO example_users(id, name)
SELECT n, 'test_'||n FROM generate_series(1, 100000000) n;

The table now is 5.6GB big:

SELECT pg_size_pretty(pg_relation_size('example_users')) as size;

  size
---------
 5737 MB

Since PostsgreSQL 10, great parallel seq scan support, let’s take a look. First, let’s see how fast when we turn off parallel seq scan:

SET max_parallel_workers_per_gather = 0;

EXPLAIN ANALYZE SELECT * from example_users where name = 'test_1';
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Seq Scan on example_users  (cost=0.00..1984425.90 rows=1 width=25) (actual time=1471.182..7638.921 rows=1 loops=1)
   Filter: ((name)::text = 'test_1'::text)
   Rows Removed by Filter: 99999999
 Planning Time: 0.039 ms
 Execution Time: 7638.933 ms

Let’s take a look at the costs:

SELECT
  relpages,
  current_setting('seq_page_cost') AS seq_page_cost,
  relpages *
  current_setting('seq_page_cost')::decimal AS page_cost,
  reltuples,
  current_setting('cpu_tuple_cost') AS cpu_tuple_cost,
  reltuples *
  current_setting('cpu_tuple_cost')::decimal AS tuple_cost
FROM pg_class
WHERE relname = 'example_users';

 relpages | seq_page_cost | page_cost |   reltuples   | cpu_tuple_cost | tuple_cost
----------+---------------+-----------+---------------+----------------+------------
   734311 | 1             |    734311 | 1.0000919e+08 | 0.01           | 1000091.92
Seq Scan Cost = relpages * seq_page_cost + reltuples * cpu_tuple_cost
              = page_cost + tuple_cost
              = 734311 * 1 + 100009190 * 0.01
              = 734311 + 1000091.90
              = 1734402.9

Parallel Related Settings

SHOW max_worker_processes;
SHOW max_parallel_workers_per_gather;
SHOW max_parallel_workers;
SHOW parallel_tuple_cost;
SHOW parallel_setup_cost;
SHOW min_parallel_table_scan_size;
SHOW min_parallel_index_scan_size;

Default values (PostgreSQL 12.3):

SHOW max_worker_processes;
 max_worker_processes
----------------------
 8

SHOW max_parallel_workers_per_gather;
 max_parallel_workers_per_gather
---------------------------------
 2

SHOW max_parallel_workers;
 max_parallel_workers
----------------------
 8

SHOW parallel_tuple_cost;
 parallel_tuple_cost
---------------------
 0.1

SHOW parallel_setup_cost;
 parallel_setup_cost
---------------------
 1000

SHOW min_parallel_table_scan_size;
 min_parallel_table_scan_size
------------------------------
 8MB

SHOW min_parallel_index_scan_size;
 min_parallel_index_scan_size
------------------------------
 512kB

Let’s see how Postgres 12.3 finds user with name test_1 with default’s max_parallel_workers_per_gather sets to 2

SET max_parallel_workers_per_gather = 2;

EXPLAIN ANALYZE SELECT * from example_users where name = 'test_1';
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..1256192.31 rows=1 width=25) (actual time=1800.358..5981.957 rows=1 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   ->  Parallel Seq Scan on example_users  (cost=0.00..1255192.21 rows=1 width=25) (actual time=4383.325..5676.442 rows=0 loops=3)
         Filter: ((name)::text = 'test_1'::text)
         Rows Removed by Filter: 33333333
 Planning Time: 0.041 ms
 Execution Time: 5981.991 ms

About 20-30% faster than parallel.

Now set to use at most 8 workers because I’m on a 8-core MacBook Pro:

SET max_parallel_workers_per_gather = 8;

It only uses 6 workers, but we are now ~ 5x faster!!

EXPLAIN ANALYZE SELECT * from example_users where name = 'test_1';
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..943663.58 rows=1 width=25) (actual time=1214.798..1219.644 rows=1 loops=1)
   Workers Planned: 6
   Workers Launched: 6
   ->  Parallel Seq Scan on example_users  (cost=0.00..942663.48 rows=1 width=25) (actual time=1070.600..1208.928 rows=0 loops=7)
         Filter: ((name)::text = 'test_1'::text)
         Rows Removed by Filter: 14285714
 Planning Time: 0.040 ms
 Execution Time: 1219.660 ms