5

Let's say I have a simple table agg_test with 3 columns - id, column_1 and column_2. Dataset, for example:

id|column_1|column_2
--------------------
 1|       1|       1
 2|       1|       2
 3|       1|       3
 4|       1|       4
 5|       2|       1
 6|       3|       2
 7|       4|       3
 8|       4|       4
 9|       5|       3
10|       5|       4 

A query like this (with self join):

SELECT
  a1.column_1,
  a2.column_1,
  ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2)
FROM agg_test a1
JOIN agg_test a2 ON a1.column_2 = a2.column_2 AND a1.column_1 <> a2.column_1
WHERE a1.column_1 = 1
GROUP BY a1.column_1, a2.column_1

Will produce a result like this:

column_1|column_1|array_agg
---------------------------
       1|       2|      {1}
       1|       3|      {2}
       1|       4|    {3,4}
       1|       5|    {3,4}

We can see that for values 4 and 5 from the joined table we have the same result in the last column. So, is it possible to somehow group the results by it, e.g:

column_1|column_1|array_agg
---------------------------
       1|     {2}|      {1}
       1|     {3}|      {2}
       1|   {4,5}|    {3,4}

Thanks for any answers. If anything isn't clear or can be presented in a better way - tell me in the comments and I'll try to make this question as readable as I can.

2 Answers 2

4

I'm not sure if you can aggregate by an array. If you can here is one approach:

select col1, array_agg(col2), ar
from (SELECT a1.column_1 as col1, a2.column_1 as col2,
             ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2) as ar
      FROM agg_test a1 JOIN
           agg_test a2
           ON a1.column_2 = a2.column_2 AND a1.column_1 <> a2.column_1
      WHERE a1.column_1 = 1
      GROUP BY a1.column_1, a2.column_1
     ) t
group by col1, ar

The alternative is to use array_dims to convert the array values into a string.

Sign up to request clarification or add additional context in comments.

Comments

2

You could also try something like this:

SELECT DISTINCT
  a1.column_1,
  ARRAY_AGG(a2.column_1) OVER (
    PARTITION BY
      a1.column_1,
      ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2)
  ) AS "a2.column_1 agg",
  ARRAY_AGG(DISTINCT a1.column_2 ORDER BY a1.column_2)
FROM agg_test a1
JOIN agg_test a2 ON a1.column_2 = a2.column_2 AND a1.column_1  a2.column_1
WHERE a1.column_1 = 1
GROUP BY a1.column_1, a2.column_1
;

(Highlighted are the parts that are different from the query you've posted in your question.)

The above uses a window ARRAY_AGG to combine the values of a2.column_1 alongside the other other ARRAY_AGG, using the latter's result as one of the partitioning criteria. Without the DISTINCT, it would produce two {4,5} rows for your example. So, DISTINCT is needed to eliminate the duplicates.

Here's a SQL Fiddle demo: http://sqlfiddle.com/#!1/df5c3/4

Note, though, that the window ARRAY_AGG cannot have an ORDER BY like it's "normal" counterpart. That means the order of a2.column_1 values in the list would be indeterminate, although in the linked demo it does happen to match the one in your expected output.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.