0

I have a PostgreSQL query for a time series graph with an inner join. A list of tags is used as a where condition for the inner join. I use union all to get a result table with data from two cases: a query with tags in the list and a query with an empty tag list. I want rows with null values from the first select before the union all if the tag list is empty but PostgreSQL returns SELECT 0 i.e. no rows at all. I want null rows because without them I can't programmatically separate the cases.

How can I get a result with null rows? Or is there a way of obtaining the result only from the select before union all if the tag list is not empty and the result only from the select after the union all if the tag list is empty? If it is not possible to achieve any of this, then suggestions about how to extract the data of the two cases from the current query or a modification of it are very welcome. There is no need for handling cases where a restaurant has multiple tags: either a restaurant has one tag or it does not have any tags.

Examples of the query, tables with dummy values and comments here: db<>fiddle.

-- I want null values from this first SELECT before UNION ALL unless
-- a more elegant/efficient solution is possible. But I get 'SELECT 0'.
SELECT 
  f.days AS days,
  SUM (f.waste_kgs) AS waste,
  SUM (f.prepared_kgs) AS prepared,
  f.waste_type_id AS waste_id,
  f.restaurant_id AS restaurant
FROM food f
INNER JOIN tags t
  ON f.restaurant_id = t.restaurant_id
WHERE t.name = ANY (array[]::text[]) -- Empty tag list as an argument.
  AND f.days BETWEEN '2000-1-1' and '2099-1-1'
  AND f.restaurant_id = ANY (array[1, 2, 3])
GROUP BY days, restaurant, waste_id 
-- Separately running the first SELECT returns
-- days     waste   prepared    waste_id    restaurant
-- SELECT 0

UNION ALL 
-- The same query without the INNER JOIN and tag list argument. 
SELECT
  f.days AS days,
  SUM (f.waste_kgs) AS waste,
  SUM (f.prepared_kgs) AS prepared,
  f.waste_type_id AS waste_id,
  f.restaurant_id AS restaurant
FROM food f
WHERE f.days BETWEEN '2000-1-1' and '2099-1-1'
  AND f.restaurant_id = ANY (array[1, 2, 3])
GROUP BY days, restaurant, waste_id;

-- The whole query returns
-- days       waste     prepared    waste_id    restaurant
-- 2023-01-01   2.5        16.0           1              1
-- 2023-01-02   8.6         7.3           1              2
-- 2023-01-03   10.5        1.8           1              3
-- 2023-01-03   0.8         0.0           2              3
-- SELECT 4
-- No null values to differentiate the empty tag list argument case.

Here are the tables and values from db<>fiddle used in the examples.

CREATE TABLE restaurants (
  id int PRIMARY KEY,
  name text,
  type text
);
CREATE TABLE food (
  id int PRIMARY KEY,
  restaurant_id int REFERENCES restaurants (id),
  waste_type_id smallint NOT NULL,
  product_id int NOT NULL,
  waste_kgs decimal NOT NULL, 
  prepared_kgs decimal NOT NULL,
  customers smallint NOT NULL,
  days date NOT NULL
);
CREATE TABLE tags (
  id int PRIMARY KEY,
  restaurant_id int REFERENCES restaurants (id),
  name text
);
INSERT INTO restaurants VALUES
(1, 'restaurant_1'),
(2, 'restaurant_2'),
(3, 'restaurant_3');
INSERT INTO food VALUES 
(1, 1, 1, 1, 1.7, 8.0, 96, '2023-1-1'),
(2, 1, 1, 10, 0.5, 7.0, 96, '2023-1-1'),
(3, 1, 1, 15, 0.3, 1.0, 96, '2023-1-1'),
(4, 2, 1, 12, 7.0, 0.8, 39, '2023-1-2'),
(5, 2, 1, 10, 1.1, 5.0, 39, '2023-1-2'),
(6, 2, 1, 11, 0.5, 1.5, 39, '2023-1-2'),
(7, 3, 1, 8, 10.0, 0.3, 97, '2023-1-3'),
(8, 3, 2, 17, 0.8, 0.0, 97, '2023-1-3'),
(9, 3, 1, 11, 0.5, 1.5, 39, '2023-1-3');
INSERT INTO tags VALUES
(1, 1, 'tag_1');

The following query from the same tables with an empty tag list gives a table with null values from the inner join so I can use it with union all on itself - like I attempt to do with the query above - and separate the results easily. It is demonstrated in db<>fiddle, too.

-- The first row of the result displays calculations from the tagged restaurants. 
-- In this case it has null values because tag list is empty.
-- The second row of the result displays calculations from all the restaurants without
-- considering any tags.
SELECT Total, Total - Drinks AS "Without drinks", Drinks   
FROM (
  SELECT 
  SUM (f.prepared_kgs) 
  FILTER (
    WHERE f.days BETWEEN '2000-1-1' AND '2099-1-1'
    AND f.restaurant_id = ANY (array[1, 2, 3])
  ) AS Total,
  SUM (f.prepared_kgs)
  FILTER (
    WHERE (f.product_id = 10 OR f.product_id = 17) 
    AND f.days BETWEEN '2000-1-1' AND '2099-1-1'
    AND f.restaurant_id = ANY (array[1, 2, 3])
  ) AS Drinks
  FROM food f
  INNER JOIN tags t 
    ON t.restaurant_id = f.restaurant_id 
  WHERE t.name = ANY (array[]::text[]) -- Empty tag list as an argument.  
)
UNION ALL
-- The same query without the INNER JOIN and tag list argument.
SELECT Total, Total - Drinks AS "Without drinks", Drinks   
FROM (
  SELECT 
  SUM (f.prepared_kgs) 
  FILTER (
    WHERE f.days BETWEEN '2000-1-1' AND '2099-1-1'
    AND f.restaurant_id = ANY (array[1, 2, 3])
  ) AS Total,
  SUM (f.prepared_kgs)
  FILTER (
    WHERE (f.product_id = 10 OR f.product_id = 17) 
    AND f.days BETWEEN '2000-1-1' AND '2099-1-1'
    AND f.restaurant_id = ANY (array[1, 2, 3])
  ) AS Drinks
  FROM food f
);
-- If tag list had elements, my program uses the first row.
-- Otherwise it uses the second row. Time series query returns 
-- rows for each day but the logic I try to use is the same.
total   Without drinks  drinks
 null             null    null
 25.1             13.1    12.0
SELECT 2
9
  • 1
    "There is no need for handling cases where a restaurant has multiple tags: either a restaurant has one tag or it does not have any tags" - I wouldn't trust that unless the tags table has a UNIQUE constraint on the restaurant_id column Commented Dec 30, 2023 at 11:43
  • 1
    It sounds a bit like you should use LEFT JOIN tags t ON f.restaurant_id = t.restaurant_id AND t.name = ANY (array[]::text[]) instead of an INNER JOIN but I'm not quite sure what exactly your desired result is. What do you mean by "result with NULL rows"? Normally you don't get any result rows from nothing (SUM is an exception to the rule), so you'll have to filter for at least something and I'm not sure in which columns you expect a NULL value then. Commented Dec 30, 2023 at 11:58
  • 2
    UNIQUE (restaurant_id) means each restaurant can only appear once. If it can only appear once, it can't have multiple tags. UNIQUE (restaurant_id, name) means each restaurant can appear multiple times, but each occurence must have a different tag from the other occurences (can have multiple tags, but no tag more than once). Commented Dec 30, 2023 at 12:55
  • 1
    I think this is actually a XY problem: "without them I can't programmatically separate the cases." - show us the code that is processing the results, I think we would be able to actually help you with that Commented Dec 30, 2023 at 15:44
  • 1
    Unless the model constrains each restaurant to at most one tag, the query must be structured to accommodate multiple tags per restaurant. The database model is deficient. If a tag is unconstrained text, then eliminate the tags table and add a tag column to restaurants. If tags are restricted to a set of names, then remove the reference to restaurants from tags and either add a reference to tags to restaurants if each restaurant can have only one tag, or create a restaurants_tags associative (bridge) table. Commented Dec 30, 2023 at 17:46

1 Answer 1

1

how to extract the data of the two cases from the current query

SELECT 1 AS query, columns... FROM query1
UNION ALL    
SELECT 2 AS query, columns... FROM query2

The result will contain a column "query" with value 1 or 2 depending on which branch of the UNION ALL the row came from.

However in your case I think there's a better (and faster) solution.

The first query does an aggregate on a subset of the second. Here's a simplified version, because I'm too lazy to copy all your columns and tables:

SELECT group, sum(qty) FROM table WHERE key IN (...) GROUP BY group
UNION ALL
SELECT group, sum(qty) FROM table GROUP BY group

"key in (...)" is your tag condition, but it could be any boolean condition. If you do instead:

SELECT group, key IN (...) AS flag, sum(qty) FROM table GROUP BY group, flag

Then the result will contain the same data, organized differently: one row for the sum where the condition is satisfied, and one where it is not. To get the total sums, you have to add the sums for these two rows in the application.

You could also use ROLLUP, in which case postgres will do the addition.

This makes for a much simpler query, but if your application code expects a specific format and you can't change it, maybe it won't work. Although you could put the above query in a materialized CTE and reuse it twice to format the results in the way you need.

IMO the database's job is to return data as quickly as possible, and choosing the format in which data is returned is part of the optimization. In other words, formatting is not the database's job, but the application's job. So as suggested, I would move some of the logic into the application.

Sign up to request clarification or add additional context in comments.

2 Comments

Yes, I solved the problem on the application side. Thank you anyway. I'll study your answer and if it does solve the problem, I'll accept it.
I saw in the comments you used two queries, I was about to suggest that too. Nothing wrong with that, unless it's slow. In this case my suggestion should be faster, as it avoids one query.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.