3

I am writing a T-SQL function to convert XML data into a JSON string. My goal is to automatically group repeated child elements with the same tag name into a JSON array, rather than creating separate key-value pairs.

The built-in FOR JSON PATH in SQL Server handles this automatically for repeated table rows, but my recursive function is based on FOR XML PATH, which doesn't handle the grouping as needed.

Example 1: multiple simple elements

<root>
    <Book>Book1</Book>
    <TransactionId>abc123</TransactionId>
    <TransactionId>abc456</TransactionId>
    <Publisher>Amazon</Publisher>
    <PublisherId>1</PublisherId>
    <UserId>9457</UserId>
</root>
[
    {
        "OrderRef": "Book1",
        "TransactionId": [ "abc123", "abc456" ],
        "Publisher": "Amazon",
        "PublisherId": 1,
        "UserId": 9457
    }
]

Example 2: multiple complex elements

<root>
    <Book>Book1</Book>
    <TransactionId>abc123</TransactionId>
    <TransactionId>abc456</TransactionId>
    <Publisher>Amazon</Publisher>
    <Edition>
        <Name>Ed1</Name>
        <Color>Red</Color>
        <Price>100</Price>
    </Edition>
    <Edition>
        <Name>Ed2</Name>
        <Color>Blue</Color>
        <Price>200</Price>
    </Edition>
    <PublisherId>1</PublisherId>
    <UserId>1234</UserId>
</root>
[
    {
        "OrderRef": "Book1",
        "TransactionId": [ "abc123", "abc456" ],
        "Publisher": "Amazon",
        "Edition": [
          {
            "Name": "Ed1",
            "Color": "Red",
            "Price": 100
          },
          {
            "Name": "Ed2",
            "Color": "Blue",
            "Price": 200
          }
        ],
        "PublisherId": 1,
        "UserId": 1234
    }
]

My current function uses recursion with FOR XML PATH, but it incorrectly handles repeated elements by outputting them as separate key-value pairs.

CREATE FUNCTION [dbo].[Func_XmlToJson](@XmlData xml)  
RETURNS nvarchar(max)  
AS  
BEGIN  
    DECLARE @m nvarchar(max)  

    SELECT @m= '[' + STUFF((SELECT theline 
                            FROM
                                (SELECT ','+' {' + STUFF((SELECT ',"'+coalesce(b.c.value('local-name(.)', 'NVARCHAR(255)'),'')+'":'+  
                      case when b.c.value('count(*)','int')=0   
                      then  '"'+isnull(replace(b.c.value('text()[1]','NVARCHAR(MAX)'),'\','\\'),'')+'"'  
                      else dbo.Func_XmlToJson(b.c.query('.'))  
                      end  
                 FROM x.a.nodes('*') b(c)  
                 FOR XML PATH(''),TYPE).value('(./text())[1]','NVARCHAR(MAX)')  
               ,1,1,'')+'}'  
          FROM @XmlData.nodes('/*') x(a)  
       ) JSON(theLine)  
       FOR XML PATH(''),TYPE).value('.','NVARCHAR(MAX)')  
      ,1,1,'')+']'  
   RETURN @m  
END

How can I modify this T-SQL function to correctly identify and group repeated elements into JSON arrays, instead of creating duplicate keys?

2 Answers 2

2

You need to group up first by the local-name(.) value and check the count, then group again.

STRING_AGG is available to you, and will be much faster and easier than FOR XML aggregation. You should also use STRING_ESCAPE to correctly escape JSON.

It would in any case be faster to recurse only on nodes which have child nodes, instead of recursing even on text nodes. So you would parse out a text() node if any, and use ISNULL to fallback to the recursion.

I've also added in changes based on your comments regarding handling nulls and the json:Array attribute.

CREATE OR ALTER FUNCTION dbo.XmlToJson(@XmlData xml)  
RETURNS nvarchar(max)
WITH RETURNS NULL ON NULL INPUT
AS  
BEGIN  
    DECLARE @m nvarchar(max);

    WITH XMLNAMESPACES (
        N'http://james.newtonking.com/projects/json' AS json
    )
    SELECT @m = '{' + STRING_AGG(
      '"' + STRING_ESCAPE(name, 'json') + '":' + value,
      ','
    ) + '}'
    FROM (
        SELECT
            v.name,
            CONCAT(
              CASE WHEN COUNT(*) > 1 OR MAX(isArray) = 1 THEN '[' END,
              STRING_AGG(
                ISNULL(
                  '"' + STRING_ESCAPE(x.a.value('text()[1]', 'nvarchar(max)'), 'json') + '"',
                  dbo.XmlToJson(x.a.query('./*'))
                ),
                ','
              ),
              CASE WHEN COUNT(*) > 1 OR MAX(isArray) = 1 THEN ']' END
           ) AS value
        FROM @XmlData.nodes('./*') x(a)
        CROSS APPLY (SELECT
            x.a.value('local-name(.)', 'nvarchar(4000)') AS name,
            x.a.value('xs:int(xs:boolean(@json:Array))', 'int') AS isArray
        ) v
        GROUP BY
            v.name
    ) grouped;
   SET @m = ISNULL(@m, 'null');
   RETURN @m;
END;

db<>fiddle

SQL Server 2025, which is in RC1 beta-release as of date of writing, will have JSON_OBJECTAGG and JSON_ARRAYAGG, which would make this significantly simpler.

CREATE OR ALTER FUNCTION dbo.XmlToJson(@XmlData xml)  
RETURNS nvarchar(max)  
WITH RETURNS NULL ON NULL INPUT
AS
BEGIN  
    DECLARE @m nvarchar(max);

    WITH XMLNAMESPACES (
        N'http://james.newtonking.com/projects/json' AS json
    )
    SELECT @m = JSON_OBJECTAGG(name : JSON_QUERY(value))
    FROM (
        SELECT
            v.name,
            CASE WHEN COUNT(*) > 1 OR MAX(isArray) = 1 THEN
              JSON_ARRAYAGG(value)
            ELSE
              '"' + STRING_ESCAPE(value, 'json') + '"'
            END AS value
        FROM @XmlData.nodes('./*') x(a)
        CROSS APPLY (SELECT
            x.a.value('local-name(.)', 'nvarchar(4000)') AS name,
            ISNULL(
              x.a.value('text()[1]', 'nvarchar(max)'),
              dbo.XmlToJson(x.a.query('./*'))
            ) AS value,
            x.a.value('xs:int(xs:boolean(@json:Array))', 'int') AS isArray
        ) v
        GROUP BY
            v.name
    ) grouped;
   SET @m = ISNULL(@m, 'null');
   RETURN @m; 
END;
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks! That works fine. But, I tried few other scenarios. I am attempting to convert the following XML into JSON, but the json:Array="true" attributes are being ignored, and single-element nodes are not converted into arrays. Added SQL Fiddle for your reference. Also, The self-closing <Release /> tag is not being converted to "Release": null
OK see changes, but in future you should make a new question as we can only go off what was in the original question.
Sure. Thanks! That really helped.
2
With FOR XML PATH

As FOR XML PATH won't accept a GROUP BY, you'll have to add layers of subqueries, alternating subqueries that FOR XML PATH and subqueries that GROUP BY (this would of course be simplified if you used STRING_AGG() instead, but you probably have your reasons).

  • After some pain with subqueries, I rewrote your query from subqueries to CTE, which (in my opinion) give a more intelligible way of reading the query with its sequenced steps… but adds a lot of verticality
  • as pointed by @Charlieface, using the dedicated string_escape() will handle more cases than replace(…,'\','\\') (already a good start)
  • a CROSS APPLY allows to remove one subquery / CTE, by computing your x and b tables in one pass
    (there probably is a way to simplify even more, by accessing /*/* or doing only 1 level deep in each call to Func_XmlToJson, but this was hard enough that I'm content to at least have a result, so optimizing could do another question on SO)
  • thus element outputs (non-grouped) rows for each XML direct subelement
  • then elgr will do the grouping by localname, concatenating all subelements with the same name, with [ and ] added only for a count of 2 or more (to get ["abc123","abc456"], but only "Book1" without brackets)
  • we finally concat those possibly array subelements to construct the final JSON object
  • Note that the GROUP BY will reorder the elements by alphabetical order, but hey, we're in JSON so this is semantically correct.
CREATE FUNCTION [dbo].[Func_XmlToJson](@XmlData xml)  
RETURNS nvarchar(max)  
AS  
BEGIN  
    DECLARE @m nvarchar(max);
    WITH
        element AS
        (
            SELECT
                coalesce(b.c.value('local-name(.)', 'NVARCHAR(255)'),'') localname,
                case when b.c.value('count(*)','int')=0   
                then  '"'+isnull(string_escape(b.c.value('text()[1]','NVARCHAR(MAX)'),'json'),'')+'"'  
                else dbo.Func_XmlToJson(b.c.query('.'))  
                end contents
            FROM @XmlData.nodes('/*') x(a)
            CROSS APPLY a.nodes('*') b(c)
        ),
        elgr AS
        (
            SELECT
                localname,
                CASE
                    WHEN count(*) = 1 THEN MAX(contents)
                    ELSE
                    '['
                    +
                    STUFF
                    (
                        (
                            SELECT ','+contents
                            FROM element eall
                            WHERE eall.localname = element.localname
                            FOR XML PATH(''), TYPE
                        ).value('.', 'NVARCHAR(MAX)'),
                        1,1,''
                    )
                    +
                    ']'
                    END
                contents
            FROM element
            GROUP BY localname
        )
        SELECT @m='{' + STUFF((SELECT ',"'+localname+'":'+  
                  contents FROM elgr
             FOR XML PATH(''), TYPE).value('(./text())[1]','NVARCHAR(MAX)')  
           ,1,1,'')+'}'
      ;
   RETURN @m;
END;

Which on your example returns:

{
    "Book": "Book1",
    "Edition": [
        {
            "Color": "Red",
            "Name": "Ed1",
            "Price": "100"
        },
        {
            "Color": "Blue",
            "Name": "Ed2",
            "Price": "200"
        }
    ],
    "Publisher": "Amazon",
    "PublisherId": "1",
    "TransactionId": [
        "abc123",
        "abc456"
    ],
    "UserId": "1234"
}

(as seen on this SQL Server 2014 fiddle)

With STRING_AGG

You didn't give your reasons for using FOR XML PATH as a way to STRING_AGG, so I'm supposing that you are limited to an old (< 2017) version of SQL Server.
So, for your information, here is what you could do in SQL Server 2017: same structure, but way simpler (and obvious) concatenating over the GROUP BY:

-- With STRING_AGG instead of the 2 STUFF(…) occurrences:
CREATE FUNCTION [dbo].[Func_XmlToJson](@XmlData xml)  
RETURNS nvarchar(max)  
AS  
BEGIN  
    DECLARE @m nvarchar(max);
    WITH
        element AS
        (
            SELECT
                coalesce(b.c.value('local-name(.)', 'NVARCHAR(255)'),'') localname,
                case when b.c.value('count(*)','int')=0   
                then  '"'+isnull(string_escape(b.c.value('text()[1]','NVARCHAR(MAX)'),'json'),'')+'"'  
                else dbo.Func_XmlToJson(b.c.query('.'))  
                end contents
            FROM @XmlData.nodes('/*') x(a)
            CROSS APPLY a.nodes('*') b(c)
        ),
        elgr AS
        (
            SELECT
                localname,
                CASE
                    WHEN count(*) = 1 THEN MAX(contents)
                    ELSE '['+STRING_AGG(contents, ',')+']'
                END contents
            FROM element
            GROUP BY localname
        )
        SELECT @m='{'+STRING_AGG('"'+localname+'":'+contents, ',')+'}'
        FROM elgr
   ;
   RETURN @m;
END;

(last function definition in this fiddle running SQL Server 2017)

8 Comments

I tried the function provided by you. But the function is incorrectly returning the Edition field as a nested array ([ [{}], [{}] ]) instead of an array of objects ([ {}, {} ]). How can I prevent the extra nesting? The output I got: { "Book": "Book1", "Edition": [ [ { "Name": "Ed1", "Color": "Red", "Price": "100" } ], [ { "Name": "Ed2", "Color": "Blue", "Price": "200" } ] ], "Publisher": "Amazon", "PublisherId": "1", "TransactionId": [ "abc123", "abc456" ], "UserId
Hum, can you reproduce it by fiddling with the SQL Server 2014 fiddle I provided (paste the function you tried to overwrite one of the fiddle, paste your example data into the next call block, adapt the version of SQL Server, run, then copy the resulting URL and paste it as a new comment here)? By the way you still didn't tell which version of SQL Server you use: maybe it's due to your version of SQL Server? I couldn't test anything older than SQL Server 2014 (which is the oldest version proposed by db<>fiddle).
I use SQL Server 20.2.1
I tried both the functions from the fiddle in my SQL Server and i am still facing the same issue. The result I get: {"Book":"Book1","Edition":[[ {"Name":"Ed1","Color":"Red","Price":"100"}],[ {"Name":"Ed2","Color":"Blue","Price":"200"}]],"Publisher":"Amazon","PublisherId":"1","TransactionId":["abc123","abc456"],"UserId":"1234"}
SQL Server 2025 is version 17 (sqlserverbuilds.blogspot.com). 20 is probably the version of SSMS serving as your interface to the server: could you SELECT @@version; to get the server's version?
It is Microsoft SQL Server 2019.
Is it with, or without, spaces around brackets? And to diagnose: as the only place where the query adds [ is after the CASE WHEN count(*) = 1, could you change the '[' to '['+count(*) so that we know how many children eldr detected?
Missing STRING_ESCAPE to correctly escape JSON strings

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.