What does 'fully determines' mean in the context of postgres functional dependency statistics?

Question

I am following the example of how to use 'create statistics' in postgresql from this page.

CREATE TABLE tbl (
                     col1 int,
                     col2 int
);

INSERT INTO tbl SELECT i/10000, i/100000
FROM generate_series (1,10000000) s(i);

CREATE STATISTICS s1 (dependencies) on col1, col2 from tbl;
ANALYZE tbl;

SELECT stxname, stxkeys, stxdependencies
FROM pg_statistic_ext
WHERE stxname = 's1';

returns:

stxname,stxkeys,stxdependencies
s1,1 2,{"1 => 2": 1.000000}

However, I'm confused by this line

Looking at this, we can see that Postgres realizes that col1 fully determines col2 and therefore has a coefficient of 1 to capture that information.

In the example there are many records in col2 that are repeated. So how does col1 'fully determine' col2? What does this mean?

Laurenz Albe · Accepted Answer · 2019-08-07 09:42:39Z

5

It means that if col1 is identical for two rows, then col2 will also be identical.

In other words, if the functional dependency were perfect, you could remove col2 from the table and use a lookup table that maps col1 to col2. But don't forget that these are only statistical data from a sample of the rows, so even if you see a dependency of 1 in the extended statistics, that is no proof that it is really always so (only very often).

edited Aug 7, 2019 at 9:42

answered Aug 6, 2019 at 14:55

Laurenz Albe

63k4 gold badges58 silver badges94 bronze badges

Add a comment |

Stack Exchange Network

What does 'fully determines' mean in the context of postgres functional dependency statistics?

1 Answer 1

Your Answer

Hot Network Questions

What does 'fully determines' mean in the context of postgres functional dependency statistics?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions