0

I have created a stored procedure to attempt to replicate the split_string function that is now in SQL Server 2016.

So far I have got this:

CREATE FUNCTION MySplit
    (@delimited NVARCHAR(MAX), @delimiter NVARCHAR(100)) 
RETURNS @t TABLE
(
-- Id column can be commented out, not required for SQL splitting string
  id INT IDENTITY(1,1), -- I use this column for numbering split parts
  val NVARCHAR(MAX)
)
AS
BEGIN
    DECLARE @xml XML
    SET @xml = N'<root><r>' + replace(@delimited,@delimiter,'</r><r>') + '</r></root>'

    INSERT INTO @t(val)
        SELECT
            r.value('.','varchar(max)') AS item
        FROM
            @xml.nodes('//root/r') AS records(r)

    RETURN
END
GO

And it does work, but it will not split the text string if any part of it contains an ampersand [ &amp; ].

I have found hundreds of examples of splitting a string, but none seem to deal with special characters.

So using this:

select * 
from MySplit('Test1,Test2,Test3', ',') 

works ok, but

select * 
from MySplit('Test1 & Test4,Test2,Test3', ',') 

does not. It fails with

XML parsing: line 1, character 17, illegal name character.

What have I done wrong?

UPDATE

Firstly, thanks for @marcs, for showing me the error of my ways in writing this question.

Secondly, Thanks to all of the help below, especially @PanagiotisKanavos and @MatBailie

As this is throw away code for migrating data from old to new system, I have chosen to use @MatBailie solution, quick and very dirty, but also perfect for this task.

In the future, though, I will be progressing down @PanagiotisKanavos solution.

3
  • Replace all '&' with '&amp;', just like in normal use of html/xml? Then, once split, convert it back again? Commented Feb 15, 2018 at 16:10
  • More robust implementations of this "trick" use FOR XML to get around this type of issue... Commented Feb 15, 2018 at 16:14
  • @MatBaillie FOR XML is used for string aggregation. In this case there is only a string that gets converted to an XML by replacing the delimiter with '</r><r>'. The only solution here is to escape invalid characters Commented Feb 15, 2018 at 16:30

3 Answers 3

2

Edit your function and replace all & as &amp; This will remove the error. This happens because XML cannot parse & as it's an inbuilt tag.

Sign up to request clarification or add additional context in comments.

Comments

1
Create FUNCTION [dbo].[split_stringss](
          @delimited NVARCHAR(MAX),
          @delimiter NVARCHAR(100)
        ) RETURNS @t TABLE (id INT IDENTITY(1,1), val NVARCHAR(MAX))
AS
BEGIN
  DECLARE @xml XML
    DECLARE @var NVARCHAR(MAX)
  DECLARE @var1 NVARCHAR(MAX)
   set @var1 =  Replace(@delimited,'&','&amp;') 
  SET @xml = N'<t>' + REPLACE(@var1,@delimiter,'</t><t>') + '</t>'  
  INSERT INTO @t(val)
  SELECT  r.value('.','varchar(MAX)') as item
  FROM  @xml.nodes('/t') as records(r)
  RETURN
END

1 Comment

While this code snippet may solve the question, including an explanation really helps to improve the quality of your post. Remember that you are answering the question for readers in the future, and those people might not know the reasons for your code suggestion.
0

First of all, SQL Server 2016 introduced a STRING_SPLIT TVF. You can write CROSS APPLY STRING_SPLIT(thatField,',') as items

In previous versions you still need to create a custom splitting function. There are various techniques. The fastest solution is to use a SQLCLR function.

In some cases, the second fastest is what you used - convert the text to XML and select the nodes. A well known problem with this splitting technique is that illegal XML characters will break it, as you found out. That's why Aaron Bertrand doesn't consider this a generic splitter.

You can replace invalid characters by their encoded values, eg & with &amp; but you have to be certain that your text will never contain such encodings.

Perhaps you should investigate different techniques, like the Moden function, which can be faster in many situations :

CREATE FUNCTION dbo.SplitStrings_Moden
(
   @List NVARCHAR(MAX),
   @Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING AS
RETURN
  WITH E1(N)        AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 
                         UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 
                         UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1),
       E2(N)        AS (SELECT 1 FROM E1 a, E1 b),
       E4(N)        AS (SELECT 1 FROM E2 a, E2 b),
       E42(N)       AS (SELECT 1 FROM E4 a, E2 b),
       cteTally(N)  AS (SELECT 0 UNION ALL SELECT TOP (DATALENGTH(ISNULL(@List,1))) 
                         ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E42),
       cteStart(N1) AS (SELECT t.N+1 FROM cteTally t
                         WHERE (SUBSTRING(@List,t.N,1) = @Delimiter OR t.N = 0))
  SELECT Item = SUBSTRING(@List, s.N1, ISNULL(NULLIF(CHARINDEX(@Delimiter,@List,s.N1),0)-s.N1,8000))
    FROM cteStart s;

Personally I created and use a SQLCLR UDF.

Another option is to avoid splitting altogether and pass table-valued parameters from the client to the server. Or use a microORM like Dapper that can construct an IN (...) clause from a list of values, eg:

var products=connection.Query<Product>("select * from products where id in @ids",new {ids=myIdArray});

An ORM like EF that supports LINQ can also generate an IN clause :

var products = from product in dbContext.Products
               where myIdArray.Contains(product.Id)
               select product;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.