2

How to use two aggregate functions (for example double string_agg or just sum), but ensure that the results do not contain duplicates caused by the other aggregate function (caused by second join)? I use PostgreSQL.

Example

I have three tables:

create table boxes
(
    id   bigserial primary key,
    name varchar(255)
);

create table animals
(
    id     bigserial primary key,
    name   varchar(255),
    age    numeric,
    box_id bigint constraint animals_boxes_id references boxes
);

create table vegetables
(
    id     bigserial primary key,
    name   varchar(255),
    weight numeric,
    box_id bigint constraint vegatables_box_id references boxes
);

Some input data:

insert into boxes (name) values ('First box');
insert into animals (box_id, name, age) values (1, 'Cat', 2);
insert into animals (box_id, name, age) values (1, 'Cat', 3);
insert into animals (box_id, name, age) values (1, 'Dog', 5);
insert into vegetables (box_id, name, weight) values (1, 'Tomato', 20);
insert into vegetables (box_id, name, weight) values (1, 'Cucumber', 30);
insert into vegetables (box_id, name, weight) values (1, 'Potato', 50);

And I want to get animal names in boxes:

select b.name                                 as box_name,
       string_agg(a.name, ', ' order by a.id) as animal_names
from boxes as b
         left join animals a on b.id = a.box_id
group by b.name;

It works:

box_name animal_names
First box Cat, Cat, Dog

But I also want to get vegetable names. But it doesn't work:

select b.name                                 as box_name,
       string_agg(a.name, ', ' order by a.id) as animal_names,
       string_agg(v.name, ', ' order by v.id) as vegatable_names
from boxes as b
         left join animals a on b.id = a.box_id
         left join vegetables v on b.id = v.box_id
group by b.name;

It produces duplicates in animal names and vegetable names:

box_name animal_names vegatable_names
First box Cat, Cat, Cat, Cat, Cat, Cat, Dog, Dog, Dog Tomato, Tomato, Tomato, Cucumber, Cucumber, Cucumber, Potato, Potato, Potato

Result should be:

box_name animal_names vegatable_names
First box Cat, Cat, Dog Tomato, Cucumber, Potato

I can't just add distinct to remove duplicates because:

  • Names in tables can duplicates (two animals with name Cat). If I use distinct it produces Cat, Dog instead of Cat, Cat, Dog.
  • I use order by in string_agg (it cause ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list when added distinct). Even if I remove order by (string_agg(distinct a.name, ', ')) I can't use it because first point.

Additional info

It applies to all aggregates function: string_agg, array_agg, json_object_agg and even sum.

Sum age of animals:

select sum(a.age)
from boxes as b
         left join animals a on b.id = a.box_id
         -- left join vegetables v on b.id = v.box_id
group by b.name;

Without second join it calculates correctly (10) but with it wrong (30) - due to duplicates.

0

2 Answers 2

3

The basic problem is explained here:

For a small selection, aggregation per row is typically faster.
With LATERAL subqueries (more versatile):

SELECT b.name AS box_name, a.*, v.*
FROM   boxes b
LEFT  JOIN LATERAL (
   SELECT string_agg(a.name, ', ' ORDER BY a.id) AS animal_names
   FROM   animals a
   WHERE  a.box_id = b.id
   ) a ON true
LEFT  JOIN LATERAL (
   SELECT string_agg(v.name, ', ' ORDER BY v.id) AS vegetable_names
   FROM   vegetables v
   WHERE  v.box_id = b.id
   ) v ON true;

Or with correlated subqueries (simpler, often a bit faster):

SELECT b.name AS box_name
    , (SELECT string_agg(a.name, ', ' ORDER BY a.id)
       FROM   animals a
       WHERE  a.box_id = b.id)  AS animal_names
    , (SELECT string_agg(v.name, ', ' ORDER BY v.id)
       FROM   vegetables v
       WHERE  v.box_id = b.id) AS vegetable_names
FROM   boxes b;

See:

While aggregating the whole table, this is faster:

SELECT b.name AS box_name, a.animal_names, v.vegetable_names
FROM   boxes b
LEFT   JOIN (
   SELECT box_id, string_agg(a.name, ', ') AS animal_names   
   FROM  (
      SELECT box_id, id, name
      FROM   animals a
      ORDER  BY box_id, id
      ) a
   GROUP  BY 1
   ) a ON a.box_id = b.id
LEFT   JOIN (
   SELECT box_id, string_agg(v.name, ', ') AS vegetable_names 
   FROM  (
      SELECT box_id, id, name
      FROM   vegetables v
      ORDER  BY box_id, id
      ) v
   GROUP  BY 1   
   ) v ON v.box_id = b.id;

fiddle

Note how I sort in a subquery, which is typically faster than doing a per-aggregate sort. An optional optimization.

Aside: about varchar(255) in your test setup:

Sign up to request clarification or add additional context in comments.

Comments

-1

try to join the vegetables table to the animals table

SELECT b.name AS box_name
    ,string_agg(a.name, ', ' ORDER BY a.id) AS animal_names
    ,string_agg(v.name, ', ' ORDER BY v.id) AS vegetable_names
FROM boxes AS b
LEFT JOIN animals a ON b.id = a.box_id
LEFT JOIN vegetables v ON v.id = a.id
GROUP BY b.name;

output:

box_name animal_names vegetable_names
First box Cat, Cat, Dog Tomato, Cucumber, Potato

fiddle here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.