2

My nvarchar value contains a control character - Start of Text (0x0002 / STX). I would like to include this in my XML value. Now I know XML is not tolerant of control characters so the only way I can come up with is to include it as  (normally it should be  but XML does not like this). A side effect of doing this the way I am doing it is that normal characters get double-escaped, e.g. & becomes &.

Is there an easy way to include this character, and any other possible control characters in my XML value without having normal characters double-escaped? It doesn't have to even be as , I'd be just as happy with 0x0002, for example.

DECLARE @Data table (String nvarchar(max));
 
INSERT INTO @Data (String)
VALUES ('& hi');

--I'd like this to output <Test>&amp;#x02;&amp hi</Test> or similar
SELECT CAST ((SELECT (SELECT String AS [*] FOR XML PATH('')) 
              FROM @Data 
              FOR XML PATH('Test')) AS XML)
4
  • You can select any valid character as the escape one, but you then need to escape it if it appears literally in the string. Commented Aug 28 at 22:55
  • 1
    Does it need to be of the xml type or can it remain nvarchar? If the latter, it's pretty simple, your code is overcomplicated dbfiddle.uk/w-2AMwgh the former remains not allowed, as noted in the docs learn.microsoft.com/en-us/sql/relational-databases/xml/… Commented Aug 28 at 23:14
  • @Charlieface Sadly the actual target column for the data from this example is an xml type column Commented Aug 29 at 5:15
  • 1
    You could probably convert text to varbinary and store that, something like: dbfiddle.uk/519Wft6D dbfiddle.uk/y38DJ5hh But of course, the code needs to handle that Commented Aug 29 at 9:08

2 Answers 2

1

While SQL Server can generate XML v1.1 conformant XML very easily:

SELECT '&' + CHAR(2) + '& hi' AS Test
FOR XML PATH('');

you can't store that in the xml type, which only allows XML v1.0 characters. By the way, instead of doing CAST, you just add , TYPE to the FOR XML to get it in the xml type.

So you would need to just use REPLACE to replace all disallowed characters.

DECLARE @Data table (String nvarchar(max));
 
INSERT INTO @Data (String)
VALUES ('&' + CHAR(2) + '& hi');

SELECT
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(
  REPLACE(String,
    CHAR(0), '0x0000'),
    CHAR(1), '0x0001'),
    CHAR(2), '0x0002'),
    CHAR(3), '0x0003'),
    CHAR(4), '0x0004'),
    CHAR(5), '0x0005'),
    CHAR(6), '0x0006'),
    CHAR(7), '0x0007'),
    CHAR(8), '0x0008'),
    CHAR(11), '0x000B'),
    CHAR(12), '0x000C'),
    CHAR(14), '0x000E'),
    CHAR(15), '0x000F'),
    CHAR(16), '0x0010'),
    CHAR(17), '0x0011'),
    CHAR(18), '0x0012'),
    CHAR(19), '0x0013'),
    CHAR(20), '0x0014'),
    CHAR(21), '0x0015'),
    CHAR(22), '0x0016'),
    CHAR(23), '0x0017'),
    CHAR(24), '0x0018'),
    CHAR(25), '0x0019'),
    CHAR(26), '0x001A'),
    CHAR(27), '0x001B'),
    CHAR(28), '0x001C'),
    CHAR(29), '0x001D'),
    CHAR(30), '0x001E'),
    CHAR(31), '0x001F')
 AS Test
FROM @Data
FOR XML PATH(''), TYPE;

db<>fiddle

Sign up to request clarification or add additional context in comments.

Comments

0

XML 1.1 allows the Unicode codepoint U+0002 provided it is escaped as &#x02;. But you probably don't want to be reliant on XML 1.1, which isn't universally supported. XML 1.0 doesn't allow this codepoint whether it's escaped or not.

One solution sometimes used if these characters are very occasional is to represent them as processing instructions, for example <?char x02?>. Of course both the sender and the recipient have to understand the convention.

If you're sending binary data that could contain anything, then a better solution is to encode the entire binary data stream in base64.

2 Comments

The issue is we have a pre-existing table in SQL Server with an XML column. The XML in this column contains arbitrary textual data and has to be able to accept any possible input.
I'm afraid that's a contradiction in terms. If it's XML then it can't contain arbitrary Unicode codepoints, and if it contains arbitrary Unicode codepoints then it's not XML.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.