1

I have a table that I will populate with values from an expensive calculation (with xquery from an immutable XML column). To speed up deployment to production I have precalculated values on a test server and saved to a file with BCP.

My script is as follows

-- Lots of other work, including modifying OtherTable

CREATE TABLE FOO (...)
GO

BULK INSERT FOO
FROM 'C:\foo.dat';
GO

-- rerun from here after the break

INSERT INTO FOO 
  (ID, TotalQuantity)
SELECT 
e.ID, 
SUM(e.Quantity) as TotalQuantity
FROM (select 
    o.ID,
    h.n.value('TotalQuantity[1]/.', 'int') as TotalQuantity
FROM dbo.OtherTable o
    CROSS APPLY XmlColumn.nodes('(item/.../salesorder/)') h(n)
WHERE o.ID NOT IN (SELECT DISTINCT ID FROM FOO)
) as E
GROUP BY e.ID

When I run the script in management studio the first two statements completes within seconds, but the last statement takes 4 hours to complete. Since no rows are added to the OtherTable since my foo.dat was computed management studio reports (0 row(s) affected).

If I cancel the query execution after a couple of minutes and selects just the last query and run that separately it completes within 5 seconds.

Notable facts:

  • The OtherTable contains 200k rows and the data in XmlColumn is pretty large, total table size ~3GB
  • The FOO table gets 1.3M rows

What could possibly make the difference?
Management studio has implicit transactions turned off. Is far as I can understand each statement will then run in its own transaction.

Update:
If I first select and run the script until -- rerun from here after the break, then select and run just the last query, it is still slow until I cancel execution and try again. This at least rules out any effects of running "together" with the previous code in the script and boils down to the same query being slow on first execution and fast on the second (running with all other conditions the same).

3
  • Can you see any differences in the execution plans? With the last statement taking 4 hours, you can look at the estimated plans instead of the actual (at least for a start). Commented Jan 4, 2012 at 8:12
  • "If I cancel the query execution after a couple of minutes and selects just the last query and run that separately it completes within 5 seconds." - are you running the select on its own, inserting the results into an empty foo or inserting the results into an aleady-populated foo? Does foo get 1.3M rows mostly from the BCP process or from the insert from OtherTable? Commented Jan 4, 2012 at 8:49
  • @MarkBannister, I'm running the select with the tables populated. I'm just continuing the same script from the point where I pressed cancel. All 1.3M rows comes from the bulk insert. (That's what (0 row(s) affected) indicates). Commented Jan 4, 2012 at 9:08

3 Answers 3

2

Probably different execution plans. See Slow in the Application, Fast in SSMS? Understanding Performance Mysteries.

Sign up to request clarification or add additional context in comments.

3 Comments

I used fulltablescan.com/index.php?/archives/… to get the execution plan of the slow running query, but as far as I can see it uses the same execution plan the second time I run the query. Unfortunately I can not get the execution count of each part (I'm interested in the number of xpath evaluations) unless I wait 4 hours. I'll try to let the query run overnight to get the full execution plan.
Besides, why would the execution plan change between two identical invocations from Management Studio with no other activity in between?
why would the execution plan change between: stats
1

Could it possibly be related to the statistics being completely wrong on the newly created Foo table? If SQL Server automatically updates the statistics when it first runs the query, the second run would have its execution plan created from up-to-date statistics.

What if you check the statistics right after the bulk insert (with the STATS_DATE function) and then checks it again after having cancelled the long-running query? Did the stats get updated, even though the query was cancelled?

In that case, an UPDATE STATISTICS on Foo right after the bulk insert could help.

1 Comment

The stats-date for the PK for the FOO table is null after the query executes. Sounds like a possible fix. However I managed to get reasonable execution times by rewriting the query to a LEFT OUTER JOIN.
0

Not sure exactly why it helped, but i rewrote the last query to an left outer join instead and suddenly the execution dropped to 15 milliseconds.

INSERT INTO FOO 
  (ID, TotalQuantity)
SELECT 
e.ID, 
SUM(e.Quantity) as TotalQuantity
FROM (select 
    o.ID,
    h.n.value('TotalQuantity[1]/.', 'int') as TotalQuantity
FROM dbo.OtherTable o
INNER JOIN FOO f ON o.ID = f.ID
    CROSS APPLY o.XmlColumn.nodes('(item/.../salesorder/)') h(n)
WHERE f.ID = null
) as E
GROUP BY e.ID

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.