4

Purely relational databases (SQL) seem to suffer to nowadays from the lack of a good solution for search/indexing hierarchies. Here a systematic review of the subject: https://stackoverflow.com/q/4048151

What is surprising is that no "Lineage Column" solution makes use of a bit string column, in particular adopting this column as the primary key. Maybe this article, which called the strategy "Sort Path", suggests something along those lines... But I didn't see anything more explicit.


Illustrating

Fruits can form a stable (static) taxonomy for the owner of a grocery store. Below, each fruit in the grocery store's inventory received a bitString identifier. Apples were labeled with the prefix 0, oranges with the prefix 1. Within the set of apples, green apples receive the prefix 01.

enter image description here

When using SQL bitstring the "ORDER BY" use the lexicographic order. For example ordering all bitstrings, from 1 to 4 bits:

 00       apple "0"
 000      apple "00"
 0000     apple "000"
 01       green apple ""
 010      green apple "0"
 011      green apple "1"
 10       orange "0"
 101      orange "01"

As we can see, the lexicographical order is grouping taxons through intervals ("all apples" set, green apples subset, oranges set).

BitString labels are not numbers, but can be counters using successor function, so is valid as Primary Keys. For example the next green apple will be 0011 and the next orange 110.

In a finite tree, suppose 5 bits, we can define the function MaxCount5bits(x), to check the last child. For example for oranges MaxCount5bits('1') is 11111, for green apples MaxCount5bits('01') is 01111. That is, using standard bitString operators, MaxCount5bits(x) = substring(x || '11111',1,5).

The fruit table can be something like CREATE TABLE fruit (id bitString PRIMARY KEY, info text), so we can do any tree-search operation:

  • find all oranges: SELECT * FROM fruit WHERE id BETWEEN '1' AND MaxCount5bits('1')
  • find all green apples: SELECT * FROM fruit WHERE id BETWEEN '01' AND MaxCount5bits('01')
  • find all apples: SELECT * FROM fruit WHERE id BETWEEN '0' AND MaxCount5bits('0')
  • find the parent of the apple 01: no search is necessary, it is the prefix 0, that is something as substring(x,1,length(x)-1).
  • find the parent of the orange 101: 10 or 1 (because it is a two-bit-prefix taxonomy).
  • ... reduced the taxonomic analysis to a simple "check BETWEEN at b-tree" operation, and "check ID prefix" operation.

And, since it is indexed as a primary key, it is fast (the search engine uses a b-tree).

Question

In view of the illustrated, the question is: is it the faster solution?
That is, best performance in any SQL-database that supports bitstrings and its lexicographical order b-tree indexation.

Seems also the the most economical solution: all operations are ID-based, no extra column, and with all integrity (reference and primary key) guaranteed.


PS: perhaps a more useful variant of the model is to adopt the first bits as taxonomic control and the remaining bits as a serial-integer counter. In the illustration, there are 2 taxonomic bits.

2 Answers 2

2

Let's compare, using PostgreSQL v16 in a Notebook with Intel i7:
  LTREE module   vs   "VARBIT as pure bit strings".

Sources: https://GIST.github.com/ppKrauss/226fd3fb0dca0a6c6d4f875cb93aa2db

Testing that is the same logic

As the same as SQL Server's IsDescendantOf. At GIST all other @Charlieface examples.

CREATE TABLE treetest.fruit (
    id          bigint  NOT NULL PRIMARY KEY,
    binpath     varbit  NOT NULL,
    charpath    ltree   NOT NULL, 
    info        text
);
--find all apples:
SELECT * FROM treetest.fruit 
-- WHERE  charpath <@ '0'::ltree;
WHERE binpath BETWEEN b'0' AND treetest.MaxBits(b'0');
id binpath charpath info
4 00 0.0 apple "0"
8 000 0.0.0 apple "00"
16 0000 0.0.0.0 apple "000"
5 01 0.1 green apple ""
10 010 0.1.0 green apple "0"
11 011 0.1.1 green apple "1"

Benchmark

Comparing performances by psql's \timing command. The function treetest.generate_vbit_series(m) generates all bit strings, from empty to m bits. Using m in {16, 20, 22). This and all other functions in the cited GIST. Example populating a table, see all at GIST:

CREATE TABLE treetest.fruit_vbit (
    id          bigint  NOT NULL PRIMARY KEY,
    binpath     varbit  NOT NULL
);
INSERT INTO treetest.fruit_vbit
 SELECT treetest.vBit_to_hiddenBig(x), x 
 FROM treetest.generate_vbit_series(22) t(x)
; -- 8388606 rows for 22 bits.
  -- 2097150 rows for 20 bits; 131070 rows for 16 bits.

CPU time

Here the main performance test, repeated for each m value (14 to 22).

SELECT COUNT(*) n_t1_r1 FROM treetest.fruit_ltree
WHERE  charpath <@ '1.0'::ltree;

SELECT COUNT(*) n_t1_r2 FROM treetest.fruit_vbit
WHERE binpath BETWEEN b'10' AND treetest.MaxBits(b'10');

------
SELECT COUNT(*) n_t2_r1 FROM treetest.fruit_ltree
WHERE  charpath <@ '1.0.1.1.0.1.0.1.1.0.0.1'::ltree;

SELECT COUNT(*) n_t2_r2 FROM treetest.fruit_vbit
WHERE binpath BETWEEN b'101101011001' AND treetest.MaxBits(b'101101011001');
m bits test n result LTree vBit
14 1 n_t1=8191 14 ms 9 ms
14 2 n_t2=7 12 ms 8 ms
16 1 n_t1=32767 34 ms 27 ms
16 2 n_t2=31 21 ms 10 ms
22 1 n_t1=2097151 1500 ms 300 ms
22 2 n_t2=2047 900 ms 700 ms
--- INDEXING:
CREATE INDEX fruit_ltree_idx1 ON treetest.fruit_ltree USING GIST (charpath);
CREATE UNIQUE INDEX fruit_vbit_idx1 ON treetest.fruit_vbit (binpath);
m bits test n result LTree vBit
22 1 n_t1=2097151 1080 ms 260 ms
22 2 n_t2=2047 0.78 ms 0.53 ms

Disc usage

relname size size_bytes
fruit_ltree_idx1 3380 MB 3544358912
fruit_vbit_idx1 180 MB 188440576
 
fruit_ltree 1720 MB 1803550720
fruit_vbit 354 MB 371458048
 
fruit_vbit_pkey 252 MB 264232960
fruit_ltree_pkey 252 MB 264232960

CONCLUSION: vbit indexed is the best, for performance and disc usage. vbit was 4 times faster in large retrieve (1.5 faster in small). The vbit index use 18 times less disc, and the vbit column consumes 4 times less.

1

SQL Server already has this with the hierarchyid type, many of whose functions are understood by the optimizer and converted into seeks. Postgres seems to have similar functionality using the ltree module, although I have no experience with it.

CREATE TABLE Fruit (
    id hierarchyid PRIMARY KEY,
    info varchar(100) NOT NULL
);

INSERT Fruit (id, info)
VALUES
 ('/0/0/',  'apple "0"'),
 ('/0/0/0/',   'apple "00"'),
 ('/0/0/0/0/',    'apple "000"'),
 ('/0/1/',  'green apple ""'),
 ('/0/1/0/',   'green apple "0"'),
 ('/0/1/1/',   'green apple "1"'),
 ('/1/0/',  'orange "0"'),
 ('/1/0/1/',   'orange "01"');
--find all oranges:
SELECT *
FROM fruit
WHERE id.IsDescendantOf('/1/') = 1;
  
--find all green apples:
SELECT *
FROM fruit
WHERE id.IsDescendantOf('/0/1/') = 1;
  
--find all apples:
SELECT *
FROM fruit
WHERE id.IsDescendantOf('/0/') = 1;
  
--find the parent of the apple 01:
SELECT id.GetAncestor(1) parent
FROM fruit
WHERE id = '/0/1/';
  
--find the parent of the orange 101:
SELECT id.GetAncestor(1) parent
FROM fruit
WHERE id = '/1/0/1/';  

You can see from this fiddle that these are all seeks.


A more correct solution would store the whole tree including the missing root rows. And add a self-referencing foreign key so that an item can only exist if its parent also does.

CREATE TABLE Fruit (
    id hierarchyid PRIMARY KEY,
    info varchar(100) NOT NULL,
    parentId AS (id.GetAncestor(1)) PERSISTED REFERENCES Fruit (id),
    INDEX ix_parentId (parentId, id) INCLUDE (info)
);

INSERT Fruit (id, info)
VALUES
 ('/', 'root'),
 ('/0/', 'all apples'),
 ('/0/0/',  'apple "0"'),
 ('/0/0/0/',   'apple "00"'),
 ('/0/0/0/0/',    'apple "000"'),
 ('/0/1/',  'green apple ""'),
 ('/0/1/0/',   'green apple "0"'),
 ('/0/1/1/',   'green apple "1"'),
 ('/1/',   'all oranges'),
 ('/1/0/',  'orange "0"'),
 ('/1/0/1/',   'orange "01"');
1
  • Thanks @Charlieface, good concrete examples (!), and I am reusing them at Wiki answer. You showed what hierarchies article named "Sort Path" (and this answer named "Lineage Column" or "Materialized Path" or "Path Enumeration"). Yes, LTREE has same functionality... But the question was "is it the faster solution?", so we need a benchmark: I do something also in the Wiki answer, and you can edit there. Commented Mar 7 at 23:27

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.