2

I have two tables, XMLtable and filterTable.

I need all the XMLtable.ID values from XMLtable where the data in Col_X contains MyElement, the contents of which matches filterColumn in filterTable.

The XML for each row in Col_X may contain multiple MyElement's, and I want that ID in case ANY of those elements match ANY of the values in filterColumn.

The problem is that those columns are actually of varchar(max) datatype, and the table itself is huge (like 50GB huge). So this query needs to be as optimized as possible.

Here's an example for where I am now, which merely returns the row where the first matching element equals one of the ones I'm looking for. Due to a plethora of different error messages I can't seem to be able to change this to compare to all of the same named elements as I want to.

SELECT ID, 
   CAST(Col_X AS XML).value('(//*[local-name()=''MyElement''])', N'varchar(25)') 
FROM  XMLtable

...and then compare the results to filterTable. This already takes 5+ minutes.

What I'm trying to achieve is something like:

SELECT ID
FROM XMLtable
WHERE CAST(Col_X AS XML).query('(//*[local-name()=''MyElement''])') 
   IN (SELECT filterColumn FROM filterTable)

The only way I can currently achieve this is to use the LIKE operator, which takes like a thousand times longer.

Now, obviously it's not an option to start changing the datatypes of the columns or anything else. This is what I have to work with. :)

2
  • 1
    Can you show us a sample XML and explain what you want to extract from it? Commented Sep 28, 2012 at 12:47
  • I'd have to make one up, as obviously I can't give the real XML here. It's anywhere from dozens to hundreds of rows of various elements where MyElement may appear between 0 and X times. I want to find the ID's for the rows where it appears, and any one of those appearing instances contains any of the values in the other table. Commented Sep 28, 2012 at 13:17

3 Answers 3

4

Try this:

SELECT
  ID,
  MyElementValue
FROM 
  (
    SELECT ID, myE.value('(./text())[1]', N'VARCHAR(25)') AS 'MyElementValue'
    FROM XMLTable
      CROSS APPLY (SELECT CAST(Col_X AS XML)) as X(Col_X)
      CROSS APPLY X.Col_X.nodes('(//*[local-name()="MyElement"])') as T2(myE)
  ) T1
WHERE MyElementValue IN (SELECT filterColumn FROM filterTable)

and this:

SELECT
  ID,
  MyElementValue
FROM 
  (
    SELECT ID, myE.value('(./text())[1]', N'VARCHAR(25)') AS 'MyElementValue'
    FROM XMLTable
      CROSS APPLY (SELECT CAST(Col_X AS XML)) as X(Col_X)
      CROSS APPLY X.Col_X.nodes('//MyElement') as T2(myE)
  ) T1
WHERE MyElementValue IN (SELECT filterColumn FROM filterTable)

Update

I think that you are experiencing what is described here Compute Scalars, Expressions and Execution Plan Performance. The cast to XML is deferred to each call to the value function. The test you should make is to change the datatype of Col_X to XML.

If that is not an option you could query the rows you need from XMLTable into a temporary table that has an XML column and then do the query above against the temporary table without the need to cast to XML.

CREATE TABLE #XMLTable
(
  ID int,
  Col_X xml
)

INSERT INTO #XMLTable(ID, Col_X)
SELECT ID, Col_X
FROM XMLTable

SELECT
      ID,
      MyElementValue
    FROM 
      (
        SELECT ID, myE.value('(./text())[1]', N'varchar(25)') AS 'MyElementValue'
        FROM #XMLTable
          CROSS APPLY Col_X.nodes('//MyElement') as T2(myE)
      ) T1
    WHERE MyElementValue IN (SELECT filterColumn FROM filterTable)


DROP TABLE #XMLTable
Sign up to request clarification or add additional context in comments.

4 Comments

Hey, thanks a lot! I'll be sure to check this out next time we return to this script or when I have the time. The lower one won't work as all of our XML's use namespaces on each element, and whatever they are may vary depending on the base document. That's why I used the local-name there. But other than that I'm always interested in further optimization, so thanks! :)
@Kahn - The important change is that I use (./text())[1] to fetch the values instead of just .. It has a huge impact on the estimated cost in the query plan. You have to test it if it has the same effect on execution time. The restructuring of the query could help a bit because this only fetches the value for MyElementValue once instead of twice.
Oh right, didn't know that. I'm new to xpath so I had no idea about the intricacies of how it works. So the explanation is appreciated. :)
Got back to this now when we had a need for more XML/Xpath queries in SQL Server. Unfortunately I don't seem to get any more performance out of this than the other variation down below.
1

You could try something like this. It does at least functionally do what you want, I believe. You'll have to explore its performance with your data set empirically.

SELECT ID
FROM 
( 
   SELECT xt.ID, CAST(xt.Col_X AS XML) [content] FROM XMLTable AS xt
) AS src
INNER JOIN FilterTable AS f
ON f.filterColumn IN 
(
   SELECT 
      elt.value('.', 'varchar(25)')     
   FROM src.content.nodes('//MyElement') AS T(elt)
)

2 Comments

Hey thanks for this. But while I got it working, the performance of it was that I had it run for 2,5 hours and had to cancel. It did give me ideas on how to proceed though so many thanks!
I found that adding an additional JOIN condition based on using CHARINDEX to do a simple string search for the filter value (in the original varchar column) helped the optimizer to restrict XML parsing to a much smaller population of rows. This resulted in an order of magnitude speeding up of the query in my tests.
0

I finally got this working, and with far better performance than I expected. Below is the script that finally produced the correct result in 5 - 6 minutes.

SELECT ID, myE.value('.', N'VARCHAR(25)') AS 'MyElementValue'
FROM (SELECT ID, CAST(Col_X AS XML) AS Col_X
    FROM XMLTable) T1
CROSS APPLY Col_X.nodes('(//*[local-name()=''MyElement''])' T2(myE)
WHERE myE.value('.', N'varchar(25)') IN (SELECT filterColumn FROM filterTable)

Thanks for the help tho people!

1 Comment

It would be interesting to know if adding a CHARINDEX-based join (see my comment on my own answer) gives you any further significant improvement.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.