0

I have an rttmm table with conversation_id and duration fields. There's a query using two sub-queries in a FROM-clause, one of them is not used. I would expect it to be semantically equivalent to the one where you would remove the unused subquery, but it behaves very differently. Here's the query in question:

select
  sum(subq2.dur) as res
    from (
      select sum(rttmm.duration) as dur, rttmm.conversation_id as conv_id
      from rttmm
      group by rttmm.conversation_id
      ) as subq1,
    (
      select sum(rttmm.duration) as dur, rttmm.conversation_id as conv_id
      from rttmm
      group by rttmm.conversation_id
    ) as subq2

and here's what I would expect it to be equivalent to (just removing the subq1):

select
  sum(subq2.dur) as res
    from 
    (
      select sum(rttmm.duration) as dur, rttmm.conversation_id as conv_id
      from rttmm
      group by rttmm.conversation_id
    ) as subq2

Turns out it's not the same at all. What is the proper understanding of the first query here?

6
  • 2
    The first one will perform a cross-join. (If the first subquery returns 4 rows, and the second 3 rows, they will return 12 rows cross-joined.) Commented Jan 28, 2020 at 15:00
  • 2
    Tip of today: Always use modern, explicit JOIN syntax. Easier to write (without errors), easier to read (and maintain), and easier to convert to outer join if needed. Commented Jan 28, 2020 at 15:02
  • Remember, too, even if you do not SELECT from a source in FROM or JOIN, all underlying tables are being used some way! Commented Jan 28, 2020 at 15:20
  • @jarlh thanks! I don't need any JOIN in the final code, basically needed to do two queries separately and combine their results. Ended up join doing two queries and combining the results not via SQL. Commented Jan 28, 2020 at 15:29
  • (IMHO) While separating into 2 queries and combining the results externally worked in this case, I would caution you this is a very very poor habit to get into. SQL is designed for set processing, and is quite good at it. Suppose instead of 4 rows and 3 rows, your result set was 4M rows and 3M rows. SQL while taking quite a while would handle the resulting 12T rows. Would your external process? Start thinking in terms of sets when dealing with SQL In this case just do the appropriate join. Commented Jan 28, 2020 at 20:28

1 Answer 1

3

The first query uses the ancient SQL-89 join syntax and cross-joins two subqueries, whereas the second query does a simple select from the first subquery.

In simple words, the difference is:

select * from table1, table2 vs select * from table1

which is equivalent for

select * from table1 cross join table2 vs select * from table1

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! I don't think I want any kind of JOIN here, ended up just doing two queries and counting the result on server.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.