1

I have 2 tables that I'm trying to query. The first has a list of meters. The second, has the data for those meters. I want to get the newest reading for each meter.

Originally, this was in a group by statement, but it ended up processing all 7 million rows in our database, and took a little over a second. A subquery and a number of other ways of writing it had the same result.

I have a clustered index that covers the EndTime and the MeterDataConfigurationId columns in the MeterRecordings table.

Ultimately, this is what I wrote, which performs in about 20 milliseconds. It seems like SQL should be smart enough to perform the "group by" query in the same time.

Declare @Meters Table
(
    MeterId Integer,
    LastValue float,
    LastTimestamp DateTime
)

Declare MeterCursor Cursor For
Select Id
From MeterDataConfiguration

Declare @MeterId Int

Open MeterCursor
Fetch Next From MeterCursor Into @MeterId

While @@FETCH_STATUS =  0
Begin
    Declare @LastValue int
    Declare @LastTimestamp DateTime

    Select @LastValue = mr.DataValue, @LastTimestamp = mr.EndTime
    From MeterRecording mr
    Where mr.MeterDataConfigurationId = @MeterId
        And mr.EndTime = (Select MAX(EndTime) from MeterRecording mr2 Where mr2.MeterDataConfigurationId = @MeterId)

    Insert Into @Meters
    Select @MeterId, @LastValue, @LastTimestamp

    Fetch Next From MeterCursor Into @MeterId   
End

Deallocate MeterCursor

Select *
From @Meters

Here is an example of the same query that performs horribly:

select mdc.id, mr.EndTime
from MeterDataConfiguration mdc
inner join MeterRecording mr on
    mr.MeterDataConfigurationId = mdc.Id
    and mr.EndTime = (select MAX(EndTime) from MeterRecording mr2 where MeterDataConfigurationId = mdc.Id)
1
  • 1
    I think you would get better performance if you just clustered on mr.MeterDataConfigurationId and put a nonclustered covering index on that field and mr.endtime. Commented Nov 5, 2010 at 20:30

3 Answers 3

3

You can try a CTE (Common Table Expression) using ROW_NUMBER:

;WITH Readings AS
(
    SELECT 
       mdc.id, mr.EndTime, 
       ROW_NUMBER() OVER(PARTIION BY mdc.id ORDER BY mr.EndTime DESC) AS 'RowID'
    FROM dbo.MeterDataConfiguration mdc
    INNER JOIN dbo.MeterRecording mr ON mr.MeterDataConfigurationId = mdc.Id
)
SELECT 
   ID, EndTime, RowID
FROM
   Readings
WHERE
   RowID = 1

This creates "partitions" of data, one for each mdc.id, and numbers them sequentially, descending on mr.EndTime, so for each partition, you get the most recent reading as the RowID = 1 row.

Of course, to get decent performance, you need appropriate indices on:

  • mr.MeterDataConfigurationId since it's a foreign key into MeterDataConfiguration, right?
  • mr.EndTime since you do an ORDER BY on it
  • mdc.Id which I assume is a primary key, so it's indexed already

Update: sorry, I missed this tidbit:

I have a clustered index that covers the EndTime and the MeterDataConfigurationId columns in the MeterRecordings table.

Quite honestly : I would toss that. Don't you have some other unique ID on the MeterRecordings table that would be suitable as a clustering index? An INT IDENTITY ID or something??

If you have a compound index on (EndTime, MeterDataConfigurationId), this won't be able to be used for both purposes - ordering on EndTime, and joining on MeterDataConfigurationId - one of them will not be doable - pity!

Sign up to request clarification or add additional context in comments.

8 Comments

He mentions he has a covering clustered index which I think is part of his problem: I have a clustered index that covers the EndTime and the MeterDataConfigurationId columns in the MeterRecordings table.
That query took 9 seconds. The majority of the time was in a clustered index seek, but still involved 7 million rows.
My index is set up based on how the data is most commonly queried. I can investigate changing it, but I have to verify the impact on my previous queries. It's such an odd situation. I can get any one value so quickly, but asking SQL to do the looping for me kills it.
@Jason Young - are you sure you are accounting for query caching in rating your cursor query? Did it run that fast the first time or on subsequent attempts?
@Jason Young - FWIW, most of the time a compound clustered index is less useful that a clustered index and a covering non clustered index.
|
0

How does this query perform? This one gets all the data in MeterRecording ignoring the list in MeterDataConfiguration. If this is not safe to do so, that can be joined to this query to restrict the output.

SELECT Id, DataValue, EndTime
FROM (
select mr.MeterDataConfigurationId as Id,
       mr.DataValue
       mr.EndTime, 
       RANK() OVER(PARTITION BY mr.MeterDataConfigurationId 
                   ORDER BY mr.EndTime DESC) as r
from MeterRecording mr) as M
WHERE M.r = 1

2 Comments

18 seconds! It's ends up scanning every row. Clustered Index Scan.
@Jason What happens if you add a "Where mr.MeterDataConfigurationId IN (SELECT Id FROM MeterDataConfiguration)" to the inner query?
0

I would go with marc's answer, but if you ever need to use Cursors again(you should try to avoid them) and you need to process a lot of records, i would suggest that you create a temp table (or table variable) that has all the columns from the table you are reading plus an autogenerated identity field (IDENTITY(1,1)) and then just use a while loop to read from the table. Basically, increment an int variable (call it @id) inside the loop and do

select @col1Value = column1, @col2Value = column2, ... from @temp_table where id = @id

this is behaves just like a cursor, but i find this to be much faster.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.