Multiple array_agg() calls in a single query

Question

I'm trying to accomplish something with my query but it's not really working. My application used to have a mongo db so the application is used to get arrays in a field, now we had to change to Postgres and I don't want to change my applications code to keep v1 working.

In order to get arrays in 1 field within Postgres I used array_agg() function. And this worked fine so far. However, I'm at a point where I need another array in a field from another different table.

For example:

I have my employees. employees have multiple address and have multiple workdays.

SELECT name, age, array_agg(ad.street) FROM employees e 
JOIN address ad ON e.id = ad.employeeid
GROUP BY name, age

Now this worked fine for me, this would result in for example:

| name  | age| array_agg(ad.street)
| peter | 25 | {1st street, 2nd street}|

Now I want to join another table for working days so I do:

SELECT name, age, array_agg(ad.street), arrag_agg(wd.day) FROM employees e 
JOIN address ad ON e.id = ad.employeeid 
JOIN workingdays wd ON e.id = wd.employeeid
GROUP BY name, age

This results in:

| peter | 25 | {1st street, 1st street, 1st street, 1st street, 1st street, 2nd street, 2nd street, 2nd street, 2nd street, 2nd street}| "{Monday,Tuesday,Wednesday,Thursday,Friday,Monday,Tuesday,Wednesday,Thursday,Friday}

But I need it to result:

| peter | 25 | {1st street, 2nd street}| {Monday,Tuesday,Wednesday,Thursday,Friday}

I understand it has to do with my joins, because of the multiple joins the rows multiple but I don't know how to accomplish this, can anyone give me the correct tip?

Erwin Brandstetter · Accepted Answer · 2024-12-11 05:15:12Z

DISTINCT is often applied to repair queries that are rotten from the inside, and that's often expensive and / or incorrect. Don't multiply rows to begin with, then you don't have to fold unwanted duplicates at the end.

Joining to multiple n-tables ("has many") multiplies rows in the result set. That's efectively a CROSS JOIN or Cartesian product by proxy. See:

Two SQL LEFT JOINS produce incorrect result

There are various ways to avoid this mistake.

Aggregate first, join later

Technically, the query works as long as you join to one table with multiple rows at a time before you aggregate:

SELECT e.id, e.name, e.age, e.streets, array_agg(wd.day) AS days
FROM  (
   SELECT e.id, e.name, e.age, array_agg(ad.street) AS streets
   FROM   employees e 
   JOIN   address  ad ON ad.employeeid = e.id
   GROUP  BY e.id  -- PK covers whole row
   ) e
JOIN   workingdays wd ON wd.employeeid = e.id
GROUP  BY e.id, e.name, e.age;

It's best to include the primary key id and GROUP BY it, because name and age are not necessarily unique. Else you might merge employees by mistake.

But better aggregate in a subquery before the join, that's superior without selective WHERE conditions on employees:

SELECT e.id, e.name, e.age, ad.streets, array_agg(wd.day) AS days
FROM   employees e 
JOIN  (
   SELECT employeeid, array_agg(ad.street) AS streets
   FROM   address
   GROUP  BY 1
   ) ad ON ad.employeeid = e.id
JOIN   workingdays wd ON e.id = wd.employeeid
GROUP  BY e.id, ad.streets;

Or aggregate both:

SELECT name, age, ad.streets, wd.days
FROM   employees e 
JOIN  (
   SELECT employeeid, array_agg(ad.street) AS streets
   FROM   address
   GROUP  BY 1
   ) ad ON ad.employeeid = e.id
JOIN  (
   SELECT employeeid, array_agg(wd.day) AS days
   FROM   workingdays
   GROUP  BY 1
   ) wd ON wd.employeeid = e.id;

The last one is typically faster if you retrieve all or most of the rows in the base tables.

Note that using JOIN and not LEFT JOIN removes employees from the result that have no row in address or none in workingdays. That may or may not be intended. Switch to LEFT JOIN to retain all employees in the result.

Correlated subqueries / `JOIN LATERAL`

For selective filters on employees, consider correlated subqueries instead:

SELECT name, age
    , (SELECT array_agg(street) FROM address WHERE employeeid = e.id) AS streets
    , (SELECT array_agg(day) FROM workingdays WHERE employeeid = e.id) AS days
FROM   employees e
WHERE  e.namer = 'peter';  -- very selective

Or LATERAL subqueries:

SELECT e.name, e.age, a.streets, w.days
FROM   employees e
CROSS  JOIN LATERAL (
   SELECT ARRAY(
      SELECT street
      FROM   address
      WHERE  employeeid = e.id
      )
   ) a(streets)
CROSS  JOIN LATERAL (
   SELECT ARRAY(
      SELECT day
      FROM   workingdays
      WHERE  employeeid = e.id
      )
   ) w(days)
WHERE  e.name = 'peter';  -- very selective

See:

The last two queries retain all qualifying employees in the result.

Hi thank you, very clear explanation. I can continue thanks to you :)

Dave Koston · Accepted Answer · 2014-12-23 15:27:24Z

2

Whenever you need values that aren't repeated, use DISTINCT, like so:

SELECT name, age, array_agg(DISTINCT ad.street), array_agg(DISTINCT wd.day) FROM employees e 
JOIN address ad ON e.id = ad.employeeid 
JOIN workingdays wd ON e.id = wd.employeeid
GROUP BY name, age

answered Dec 23, 2014 at 15:27

Dave Koston

3322 silver badges7 bronze badges

2 Comments

user1391281 Over a year ago

Thank you, this is true and in my example it would work but in my case sometimes the values can be the same. In my case instead of address its actually the status of a product which can be IN_USE or FREE so it is possible that both products are FREE if I use distinct I would only get 1 value instead of the two I need. (And also for the 3rd join I can have duplicates also)

Erwin Brandstetter Over a year ago

This is incorrect in an expensive way, don't use it.

Collectives™ on Stack Overflow

Multiple array_agg() calls in a single query

2 Answers 2

Aggregate first, join later

Correlated subqueries / `JOIN LATERAL`

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Aggregate first, join later

Correlated subqueries / JOIN LATERAL

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related

Correlated subqueries / `JOIN LATERAL`