2

I have a query where I need to join a large number of tables on a single column, where records should be joined when any records from any tables match on that column. An example:

A
----------
id | a_value
----------
1  | foo
2  | bar

B
----------
id | b_value
----------
2  | cad
3  | qud

C
----------
id | c_value
----------
1  | fiz
4  | buz

D
----------
id | d_value
----------
5  | sas
6  | tos

SELECT id, a_value, b_value, c_value, d_value FROM <join A, B, C, D by id>

should return a result set like this:

results
------------------------------------------
id | a_value | b_value | c_value | d_value
------------------------------------------
1  | foo     | null    | fiz     | null
2  | bar     | cad     | null    | null
3  | null    | qud     | null    | null
4  | null    | null    | buz     | null
5  | null    | null    | null    | sas
6  | null    | null    | null    | tos

You could write the joins like this:

A FULL JOIN B ON A.id = B.id
FULL JOIN C ON A.id = C.id OR B.id = C.id
FULL JOIN D ON A.id = D.id OR B.id = D.id OR C.id = D.id

but that seems absurd, and would grow out of control rapidly as the number of columns increases (joining n tables in this manner requires n*(n-1)/2 conditions). There ahs to be a better way. Does anyone have any ideas?

0

1 Answer 1

7

There are three approaches to doing what you want. You've already explored the full outer join option, and found it wanting. By the way, you can somewhat simplify it to:

A FULL JOIN
B
ON A.id = B.id FULL JOIN
C
ON C.id = coalesce(A.id, B.id) FULL JOIN
D
ON D.id = coalesce(A.id, B.id, C.ID)

The second way has two subparts. If you have a table of all ids, then great. Just use left join:

AllIds ai left outer join
A
on ai.id = A.id left outer join
B
on ai.id = B.id . . .

You can make one, if you don't have one:

(select id from a union
 select id from b union
 select id from c union
 select id from d
) AllIds left outer join
. . .

The third way is the union all way:

select id, max(a_value) as a_value, max(b_value) as b_value,
       max(c_value) as c_value, max(d_value) as d_value
from (select a.id, a_value, NULL as b_value, NULL as c_value, NULL as d_value
      from a
      union all
      select b.id, NULL, b_value, NULL, NULL
      from b
      union all
      select c.id, NULL, NULL, c_value, NULL
      from c
      union all
      select d.id, NULL, NULL, NULL, d_value
      from d
     ) t
group by id;

These have different performance characteristics depending on the tables, indexes, and database. In practice, I have often used the second option on large tables.

Sign up to request clarification or add additional context in comments.

1 Comment

I'm often using the second option as well.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.