0

Table t1 contains a series of basic data and is unique on Id.

Table t2 contains a large amount of time series data that I need to scope down to just a subset. I am only interested in somevalue and yetanothervalue. Struggling to find the cleanest way to do that in this context.

The query below runs, but I have used MAX incorrectly. Studying mysql docs related to greatest-n-pergroup and trying to get that solved.

I am interested in the where usage and efficiencies - what is the best pattern to add those where clauses.

select t1.*,
    t2.lastdate as lastdate,
    from t1
    left join
    ( select Id,
            max(LastDate) as lastdate
            from t2table
            where
            somecolumn like '%somevalue%'
            group by Id
    ) t2
    on t1.Id = t2.Id
    where yetanothercolumn = "yetanothervalue";

Also - any links to docs or other threads and examples appreciated.

3
  • 1
    If you like, consider following this simple two-step course of action: 1. If you have not already done so, provide proper DDLs (and/or an sqlfiddle) so that we can more easily replicate the problem. 2. If you have not already done so, provide a desired result set that corresponds with the information provided in step 1. Commented Aug 21, 2015 at 13:38
  • I don't understand, it looks correct, what is the problem exactly? Commented Aug 21, 2015 at 13:42
  • not sure why the left join and not just a join. focus on indexes in place Commented Aug 21, 2015 at 13:42

1 Answer 1

1

Your query is reasonable:

select t1.*,
       t2.lastdate as lastdate,
from t1 left join
     (select Id, max(LastDate) as lastdate
      from t2table
      where somecolumn like '%somevalue%'
      group by Id
     ) t2
     on t1.Id = t2.Id
where yetanothercolumn = 'yetanothervalue';

However, it does unnecessary work on table 2 for ids that are not in the final result set. So, under many circumstances, a correlated subquery will be faster:

select t1.*,
       (select max(LastDate)
        from t2table t2
        where t2.Id = t.Id and t2.somecolumn like '%somevalue%'
       ) as lastdate,
from t1 
where yetanothercolumn = 'yetanothervalue';

For performance, you want indexes on t1(yetanothercolumn) and t2table(id, somecolumn, LastDate).

Sign up to request clarification or add additional context in comments.

3 Comments

That said, it will probably be slower than an uncorrelated one ;-)
@Strawberry, not necessarily slower. If the "t1" table has 200 records with "yetanothervalue", but has 500k records with t2 LIKE condition, you are only going to correlate on the 200 records, not the 500k.
I'd like to see the stats for that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.