2

I have come across this situation multiple times wherein I need to grab data from one or another table based on some parameter to the stored procedure. Let me clarify with an example. Suppose we need to grab some data from either an archived table or an online table and a bunch of other tables. I can think of 3 ways to accomplish this:

  1. Use an if condition and store result in a temp table and then join temp table to other tables
  2. Use an if condition and grab data either from archive table or online table and join other tables. The entire query will be duplicated except for the part of archive table or online table.
  3. Use a union subquery

Query for Approach 1

create table #archivedOrOnline (Id int);
declare @archivedData as bit = 1;
if (@archivedData = 1)
begin
    insert into #archivedOrOnline
    select 
        at.Id
    from 
        dbo.ArchivedTable at
end
else
begin
    insert into #archivedOrOnline
    select 
        ot.Id
    from
        dbo.OnlineTable ot
end

select 
    *
from
    #archivedOrOnline ao
    inner join dbo.AnotherTable at on ao.Id = at.Id;
    -- Lots more joins and subqueries irrespective of @archivedData

Query for Approach 2

declare @archivedData as bit = 1;
if (@archivedData = 1)
begin
    select 
        *
    from 
        dbo.ArchivedTable at
        inner join dbo.AnotherTable another on at.Id = another.Id
        -- Lots more joins and subqueries irrespective of @archivedData
end
else
begin
    select 
        *
    from
        dbo.OnlineTable ot
        inner join dbo.AnotherTable at on ot.Id = at.Id
        -- Lots more joins and subqueries irrespective of @archivedData
end

Query for Approach 3

declare @archivedData as bit = 1;
select 
    *
from
    (
        select 
            m.Id
        from 
            dbo.OnlineTable ot
        where
            @archivedData = 0
        union
        select 
            m.Id
        from
            dbo.ArchivedTable at
        where
            @archivedData = 1
    ) archiveOrOnline
    inner join dbo.AnotherTable at on at.Id = archiveOrOnline.Id;
    -- Lots more joins and subqueries irrespective of @archivedData

Basically I am asking which approach to choose or if there is a better approach. Approach 2 will have a lot of duplicate code. The other 2 approaches remove code duplication. I even have the query plans but my knowledge of making decisions based on the query plan is limited. I always go with the approach which removes code duplication. If there is a performance issue, I may choose another approach.

2 Answers 2

2

Your approach 3 can work fine. You should definitely use UNION ALL not UNION though so SQL Server does not add operations to remove duplicates from the tables.

For best chances of success with approach 3 you would need to add an OPTION (RECOMPILE) hint so that SQL Server simplifies out the unneeded table reference at compile time at the expense of recompiling it on each execution.

If the query is executed too frequently to make that approach attractive then you may get an OK plan without it and filters with startup predicates to only access the relevant table at run time - but you may have problems with cardinality estimates with this more generic approach and it might limit the optimisations available and give you a worse plan than option 2.

Sign up to request clarification or add additional context in comments.

7 Comments

union all not needed because there will never be duplicates since it either gets from online or archived. Could you please elaborate on this point: but you may have problems with cardinality estimates. Also, have you used approach 3?
It is needed so that SQL Server doesn't add unneeded operators to remove duplicates. It may be able to figure out that these aren't possible but don't risk it. The correct semantics for your query is UNION ALL
And regarding the cardinality estimates... to give a simple example suppose OnlineTable has 1,000 rows and ArchivedTable has 10,000,000 rows. When it is compiling a plan where it is not known until run time what branch is taken what estimated rows should it have coming out of the derived table into the join on AnotherTable?
@MartinSmith Funny, Not 10 minutes ago I had this same conversation with a client.
Sorry @MartinSmith but I still do not know why the number of estimated rows makes a difference. I guess I can look more into it. I almost understand what you mean: If the branch is not known until runtime, how would it estimate the # of rows coming out of archiveOrOnline. Yes, it is not possible to know this in this case.
|
1

If you don't mind extra unused columns in your results, you can represent such "IF"s with additional join conditions.

SELECT stuff
FROM MainTable AS m
LEFT JOIN ArchiveTable AS a ON @archivedData = 1 AND m.id = a.id
LEFT JOIN OnlineTable AS o ON @archivedData <> 1 AND m.id = o.id
;

If the Archive and Online tables have the same fields, you can even avoid extra result fields with select expressions like COALESCE(a.field1, b.field1) AS field1


If there are following joins that are dependent on values from ArchiveTable OnlineTable, this can be simplified by performing these core joins in a subquery (at least some coalesces will be necessary though)

SELECT stuff
FROM (
    SELECT m.stuff, a.stuff, o.stuff
       , COALESCE(a.field1, b.field1) AS xValue
       , COALESCE(a.field2, b.field2) AS yValue
       , COALESCE(a.field3, b.field3) AS zValue
    FROM MainTable AS m
    LEFT JOIN ArchiveTable AS a ON @archivedData = 1 AND m.id = a.id
    LEFT JOIN OnlineTable AS o ON @archivedData <> 1 AND m.id = o.id
) AS coreQuery
INNER JOIN xTable AS x ON x.something = coreQuery.xValue
INNER JOIN yTable AS y ON y.something = coreQuery.yValue
INNER JOIN zTable AS z ON z.something = coreQuery.zValue
;

If there is criteria narrowing down the MainTable rows to be used, the WHERE for them should be included in the subquery to minimize the amount of Archive/Online carried out of the subquery.


If the Archive/Online table is actually the "main" table, the question's option 3 should work, but I would suggest putting any filtering criteria relevant to those tables in the their UNIONed subqueries.


If there is no filtering criteria on whatever table is "main", I would consider just maintaining two queries (or building one dynamically) so that the subqueries these approaches necessitate are not needed and will not interfere with index use.

6 Comments

This approach will not work because you wont know which table to use for the subsequent joins a or o.
@CodingYoshi "Lots more joins and subqueries irrespective of @archivedData" suggests to me the latter joins are not dependent on which table was used, but even then ...LEFT JOIN SomeTable AS s ON (@archiveData=1 AND a.x = s.x) OR (@archiveData <> 1 AND o.x = s.x) or LEFT JOIN SomeTable AS s ON s.x = CASE WHEN @archiveData = 1 THEN a.x ELSE o.x END (Not saying those kinds of joins would be fast, just possible.)
I would have to write (@archiveData=1 AND a.x = s.x) OR (@archiveData <> 1 AND o.x = s.x) everywhere. I have a lot of code in that part and this can get pretty ugly fast. My goal is to remove this sort of thing: Code Duplication.
@CodingYoshi yes, if you have later joins dependent on which table was used, this is not an optimal solution; but you could make a subquery using (SELECT ... COALESCE ... FROM m LEFT JOIN a ON ... LEFT JOIN o ON...) AS b and then LEFT JOIN s ON b.x = s.x
A problem you might encounter with your Option 3 is that the subquery it uses may end up drawing upon all the records in the larger table when criteria on the main table could have narrowed its scope.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.