Using PostgreSQL what is the difference between a smallint and a bool for storing boolean?

Question

What is the difference between the smallint type and the bool type for storing boolean values?

This question arose in the comments to a question on Geographic Information Systems Stack Exchange.

Erwin Brandstetter · Accepted Answer · 2021-09-10 01:39:22Z

29

Always store boolean data as boolean. Only exotic exception imaginable.
Just to address the storage angle in addition to what you posted as answer:

boolean requires 1 byte on disk, smallint requires 2. But that's not the whole story.

smallint (like other integer types and unlike boolean) also has special needs for alignment padding. It can only start at an even offset from the start of the tuple data. So another byte is consumed every odd time (literally).

In a worst case scenario, when mixing with types that require 8-byte alignment like bigint or timestamp / timestamptz:

SELECT pg_column_size(row("char" 'a', FALSE   )) AS char_bool
     , pg_column_size(row("char" 'a', int2 '1')) AS char_int2
     , pg_column_size(row(text 'abcdef', TRUE    , now())) AS text7_bool_ts
     , pg_column_size(row(text 'abcdef', int2 '1', now())) AS text7_int2_ts;  -- worst case

 char_bool | char_int2 | text7_bool_ts | text7_int2_ts
-----------+-----------+---------------+---------------
        26 |        28 |            40 |            48

Details:

If you have many boolean NOT NULL values and want to optimize space on disk:

Should I use the PostgreSQL bit string?

edited Sep 10, 2021 at 1:39

answered Dec 1, 2016 at 4:59

Erwin Brandstetter

187k28 gold badges465 silver badges639 bronze badges

It seems strange that an ENUM is 4 bytes, rather than just 1 (assuming it had < 256 values.

hayd
– hayd

2019-06-19 22:22:37 +00:00
Commented Jun 19, 2019 at 22:22
@hayd: Enums can have more than 256 values. You could build your own lookup table with a 1-byte PK column to squeeze out more: dba.stackexchange.com/a/159916/3684

Erwin Brandstetter
– Erwin Brandstetter

2019-06-20 19:28:39 +00:00
Commented Jun 20, 2019 at 19:28
Thanks, I suspect majority of enums in the wild are a lot small, but this makes sense once you know enums are defined in pg_enum. It's interesting/surprising that the pg_enum table uses real/float4 for enumsortorder.

hayd
– hayd

2019-06-20 20:48:36 +00:00
Commented Jun 20, 2019 at 20:48

Add a comment |

Paul White · Accepted Answer · 2016-12-01 05:20:20Z

4

First, the size of a smallint is two bytes which is twice the size of a bool:

SELECT
  pg_column_size(1::bool) AS "bool"
  , pg_column_size(1::smallint) AS "si";

 bool | si 
------+-----
    1 |   2

Let's create a small table with 1000 rows and check it out:

CREATE TABLE foo AS
  SELECT
    CASE WHEN x>0.5 THEN 1::smallint ELSE 0::smallint END AS si,
    CASE when x>0.5 THEN true ELSE false END AS b
  FROM ( SELECT random() AS x FROM generate_series(1,1e3) ) AS t;

CREATE INDEX si ON foo (si);
CREATE INDEX b ON foo (b);

Now we can see the table with 40kB:

test=# \dt+ foo
            List of relations
 Schema | Name | Type  | Size 
--------+------+-------+-------
 public | foo  | table | 40 kB

Both indexes are the same size (40kB). And all self-joins (returning 500050 rows in my case) using a seq scan and index scan (with SET enable_seqscan=off) complete in the same amount of time:

EXPLAIN ANALYZE SELECT * FROM foo JOIN foo AS f2 USING (b);
EXPLAIN ANALYZE SELECT * FROM foo JOIN foo AS f2 USING (si);

So really the benefits are that:

the col size is one byte smaller
the table shows bool, rather than smallint which is more descriptive
eliminates room for error if you store 3.
it's likely to work in third party libraries a bit better.
it permits slightly more terse SQL, WHERE isEmpty rather than WHERE empty = 1

edited Dec 1, 2016 at 5:20

Paul White♦

96.1k30 gold badges442 silver badges691 bronze badges

answered Nov 30, 2016 at 21:56

Evan Carroll

65.8k50 gold badges263 silver badges514 bronze badges

1

The index disk usage will be identical I think for a single int2 vs a single boolean index. But try an (int4, int2, int2, int2) vs an (int4, bool, bool, bool) index and you'll see a difference.

ypercubeᵀᴹ
– ypercubeᵀᴹ

2016-11-30 23:27:49 +00:00
Commented Nov 30, 2016 at 23:27
1

@ypercubeᵀᴹ Feel free to update/improve answer with that other stuff about int4, int2 (etc). Sounds like a good contribution. I know a little bit about that but I'm approaching the limits of my knowledge there. I was just trying to answer a question on gis.se and I figured i'd write it down here since no one else had.

Evan Carroll
– Evan Carroll

2016-11-30 23:33:58 +00:00
Commented Nov 30, 2016 at 23:33
I assume at some point you cross the pagesize and then every row becomes two pages whereas if you used bool you may not hit that limit as quick? Sounds like a rare occurrence though, and I think it only matters if you pack them next to each other. Ie., bool int2 bool int2 is the same size as int2 int2 int2 int2 but bool bool int int is smaller than all of them. iirc.

Evan Carroll
– Evan Carroll

2016-11-30 23:39:34 +00:00
Commented Nov 30, 2016 at 23:39
3

A int4-int2-int2-int2 index needs 10 bytes per row (which is larger than 8. int4-bool-bool-bool needs 7 bytes (<= 8). I'm not entirely sure but the packing is in multiples of 8. Or it could be page splits, as you say. The order of columns also plays a role, at least in tables. Not sure about indexes.

ypercubeᵀᴹ
– ypercubeᵀᴹ

2016-11-30 23:43:45 +00:00
Commented Nov 30, 2016 at 23:43
1

@ypercubeᵀᴹ: All basically true. For btree indexes as well. And mostly for other index types, too. (The storage principles are the same, just some special index features.)

Erwin Brandstetter
– Erwin Brandstetter

2016-12-01 05:08:17 +00:00
Commented Dec 1, 2016 at 5:08

Add a comment |

Stack Exchange Network

Using PostgreSQL what is the difference between a smallint and a bool for storing boolean?

2 Answers 2

Your Answer

Linked

Hot Network Questions

Using PostgreSQL what is the difference between a smallint and a bool for storing boolean?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Hot Network Questions