SQL: how to join a large number of tables on a single column

Question

I have a query where I need to join a large number of tables on a single column, where records should be joined when any records from any tables match on that column. An example:

A
----------
id | a_value
----------
1  | foo
2  | bar

B
----------
id | b_value
----------
2  | cad
3  | qud

C
----------
id | c_value
----------
1  | fiz
4  | buz

D
----------
id | d_value
----------
5  | sas
6  | tos

SELECT id, a_value, b_value, c_value, d_value FROM <join A, B, C, D by id>

should return a result set like this:

results
------------------------------------------
id | a_value | b_value | c_value | d_value
------------------------------------------
1  | foo     | null    | fiz     | null
2  | bar     | cad     | null    | null
3  | null    | qud     | null    | null
4  | null    | null    | buz     | null
5  | null    | null    | null    | sas
6  | null    | null    | null    | tos

You could write the joins like this:

A FULL JOIN B ON A.id = B.id
FULL JOIN C ON A.id = C.id OR B.id = C.id
FULL JOIN D ON A.id = D.id OR B.id = D.id OR C.id = D.id

but that seems absurd, and would grow out of control rapidly as the number of columns increases (joining n tables in this manner requires n*(n-1)/2 conditions). There ahs to be a better way. Does anyone have any ideas?

Gordon Linoff · Accepted Answer · 2014-01-29 21:29:47Z

7

There are three approaches to doing what you want. You've already explored the full outer join option, and found it wanting. By the way, you can somewhat simplify it to:

A FULL JOIN
B
ON A.id = B.id FULL JOIN
C
ON C.id = coalesce(A.id, B.id) FULL JOIN
D
ON D.id = coalesce(A.id, B.id, C.ID)

The second way has two subparts. If you have a table of all ids, then great. Just use left join:

AllIds ai left outer join
A
on ai.id = A.id left outer join
B
on ai.id = B.id . . .

You can make one, if you don't have one:

(select id from a union
 select id from b union
 select id from c union
 select id from d
) AllIds left outer join
. . .

The third way is the union all way:

select id, max(a_value) as a_value, max(b_value) as b_value,
       max(c_value) as c_value, max(d_value) as d_value
from (select a.id, a_value, NULL as b_value, NULL as c_value, NULL as d_value
      from a
      union all
      select b.id, NULL, b_value, NULL, NULL
      from b
      union all
      select c.id, NULL, NULL, c_value, NULL
      from c
      union all
      select d.id, NULL, NULL, NULL, d_value
      from d
     ) t
group by id;

These have different performance characteristics depending on the tables, indexes, and database. In practice, I have often used the second option on large tables.

answered Jan 29, 2014 at 21:29

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ronnis Over a year ago

I'm often using the second option as well.

Collectives™ on Stack Overflow

SQL: how to join a large number of tables on a single column

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related