10

Let's say I have the following PostgreSQL table called products:

CREATE TABLE IF NOT EXISTS mytable (
    id serial NOT NULL PRIMARY KEY,
    label VARCHAR(50) NOT NULL,
    info jsonb NOT NULL,
    created_at timestamp NOT NULL DEFAULT now()
);

Here is some test data. Note my actual table has millions of records.

INSERT INTO products (label, info) VALUES ('a', '[1, 2, 3]');
INSERT INTO products (label, info) VALUES ('a', '[1, 2, 3]');
INSERT INTO products (label, info) VALUES ('c', '[1, 2, 3]');
INSERT INTO products (label, info) VALUES ('c', '[1, 2, 3]');
INSERT INTO products (label, info) VALUES ('b', '[1, 2, 3]');

I want to write a query that grabs distinct labels and orders the records by the created_at field. My first instinct would be to write the following query:

SELECT DISTINCT ON (label) * FROM products ORDER BY created_at DESC;

However, this fails with the following error:

ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions

Looks like I can fix this using SQL subqueries:

SELECT * FROM (
    SELECT DISTINCT ON (label) * FROM products
) AS subquery ORDER BY created_at DESC;

which generates the following expected result:

 id | label |   info    |         created_at
----+-------+-----------+----------------------------
  5 | b     | [1, 2, 3] | 2022-11-14 03:32:23.245669
  3 | c     | [1, 2, 3] | 2022-11-14 03:32:23.242813
  1 | a     | [1, 2, 3] | 2022-11-14 03:32:23.239791

Is this the best way to fix this issue? Or is there a faster way to query this data? Note, I mention above how my actual table has millions of records so I want to come up with the best query possible.

1
  • Your last query selects an arbitrary row for each label, then sorts that arbitrary selection by created_at. If that is what you want to do, then that query is probably the best way to do it. Commented Nov 14, 2022 at 13:51

2 Answers 2

13

Leading ORDER BY expressions must cover all DISTINCT ON expressions before you can append more (just rephrasing that error msg):

SELECT DISTINCT ON (label) * FROM products ORDER BY label, created_at DESC;

See:

It's unclear whether you want created_at ASC or created_at DESC: you show the latter, but the results you call "expected" match the former - because your fix does not do what you seem to think it's doing.

To sort resulting (distinct) rows by created_at DESC, you have to run an outer SELECT with a different sort order:

SELECT *
FROM  (
   SELECT DISTINCT ON (label) *
   FROM   products
   ORDER  BY label, created_at DESC
   ) sub
ORDER  BY created_at DESC;

See:

For a big table, be sure to have an index on (label, created_at) or (label, created_at DESC), respectively.

my actual table has millions of records so I want to come up with the best query possible.

There may be (much) faster solutions depending on undisclosed details. Most importantly:

  • How many distinct "labels" for how many rows in total?
  • Is there a separate table with one row per (relevant) distinct label?
  • Do you actually need SELECT *, or is SELECT label, created_at all you need?
Sign up to request clarification or add additional context in comments.

2 Comments

The query you proposed generates the wrong result. It's not ordered by created_at in descending order.
Consider the additions. Careful not to confound two distinct questions: which row to pick per label, and how to sort resulting (distinct) rows.
0

Technically you can re-write your query, replace SELECT DISTINCT ON with an aggregate function like MAX and GROUP BY. Then you'll be able to group and sort at any order you like.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.