0

I have the following problem. I need to write an application in C# that will read a given XML and prepare data for me to load into the database. In XML, the structure of which I have no influence, the main data is placed in CDATA. I have verified that the structure of this data is ordered in the correct XML structure.

I've searched hundreds of posts and can't find any solution from them. Below is the XML file from which I need to extract the data from the CDATA section. Maybe one of you can help me?

<Docs>
  <Doc>
  <Content>
    <![CDATA[
      <Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <Header DocNumber="1" Description="Desc1"></Header>
        <Poss>
          <Pos Id="1" Name="Pos1"></Pos>
          <Pos Id="2" Name="Pos2"></Pos>
        </Poss>
      </Doc>
    ]]>
  </Content>
  </Doc>
  <Doc>
  <Content>
    <![CDATA[
      <Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <Header DocNumber="2" Description="Desc2"></Header>
        <Poss>
          <Pos Id="3" Name="Pos3"></Pos>
          <Pos Id="4" Name="Pos4"></Pos>
        </Poss> 
      </Doc>
    ]]>
  </Content>
  </Doc>
</Docs>

For me, the most important are the fields contained in the Content section and I have to load them as data into the database.

6
  • Does this answer your question? How does one parse XML files? Commented Jan 23, 2023 at 13:37
  • I understand that I have to parse text from CDATA to xml right? Commented Jan 23, 2023 at 14:11
  • 1
    Is your data in CDATA a valid XML? Cause I see the <Poss> doesn't have closing tag. Commented Jan 23, 2023 at 14:31
  • Can you call a stored procedure (SP) from the c#? It will pass your XML as a parameter to the stored procedure. The SP with shred that XML parameter, and insert data into an appropriate target table. Commented Jan 23, 2023 at 16:57
  • By the way, the XML sample is not well-formed. The <Poss> XML element doesn't have a closing tag. It needs to be fixed. Commented Jan 23, 2023 at 17:09

2 Answers 2

2

To extract the data from the CData part,

  1. Construct the classes.
public class Doc
{
    public Header Header { get; set; }
    
    [XmlArrayItem(typeof(Pos), ElementName = "Pos")]
    public List<Pos> Poss { get; set; }
}

public class Header
{
    [XmlAttribute]
    public int DocNumber { get; set; }
    
    [XmlAttribute]
    public string Description { get; set; }
}

public class Pos
{
    [XmlAttribute]
    public int Id { get; set; }
    
    [XmlAttribute]
    public string Name { get; set; }
}
  1. Implement the extraction logic.

    2.1. Read the XML string as XDocument via XDocument.Parse().

    2.2. Select the DescendantNodes for the XPath of "/Docs/Doc/Content".

    2.3. Convert the nodes to XCData type.

    2.4. With XmlSerializer to deserialize the value of XCData to Doc type.

using System.Linq;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;
using System.Xml.Serialization;
using System.IO;

XmlSerializer xmlSerializer = new XmlSerializer(typeof(Doc)); 
XDocument xDoc = XDocument.Parse(xml);
        
var cdataSections = xDoc.XPathSelectElements("/Docs/Doc/Content")
    .DescendantNodes()
    .OfType<XCData>()
    .Select(x => (Doc)xmlSerializer.Deserialize(new StringReader(x.Value)))
    .ToList();

Demo @ .NET Fiddle

Sign up to request clarification or add additional context in comments.

Comments

1

Here is implementation based on a stored procedure with the XML parameter like in my comments.

I had to remove the <Poss> XML element to make CData section a well-formed XML.

SQL

DECLARE @xml XML = 
N'<Docs>
    <Doc>
        <Content><![CDATA[
      <Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <Header DocNumber="1" Description="Desc1"></Header>
          <Pos Id="1" Name="Pos1"></Pos>
          <Pos Id="2" Name="Pos2"></Pos>
      </Doc>
    ]]>
        </Content>
    </Doc>
    <Doc>
        <Content><![CDATA[
      <Doc xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <Header DocNumber="2" Description="Desc2"></Header>
          <Pos Id="3" Name="Pos3"></Pos>
          <Pos Id="4" Name="Pos4"></Pos>
      </Doc>
    ]]>
        </Content>
    </Doc>
</Docs>';

--INSERT INTO <targetTable>
SELECT h.value('(Header/@DocNumber)[1]', 'INT') AS DocNumber
    , h.value('(Header/@Description)[1]', 'VARCHAR(256)') AS DocDescription
    , d.value('@Id', 'INT') AS posID
    , d.value('@Name', 'VARCHAR(256)') AS posName
FROM @xml.nodes('/Docs/Doc/Content/text()') AS t(c)
CROSS APPLY (SELECT TRY_CAST(c.query('.').value('.', 'NVARCHAR(MAX)') AS XML)) AS t1(x)
CROSS APPLY x.nodes('/Doc') AS t2(h)
CROSS APPLY h.nodes('Pos') AS t3(d);

Output

DocNumber DocDescription posID posName
2 Desc2 3 Pos3
2 Desc2 4 Pos4
1 Desc1 1 Pos1
1 Desc1 2 Pos2

2 Comments

Thanks @Yitzhak Khabinsky it is exactly what I mean.
@jeffers, glad to hear that the proposed solution is working for you. Please don't forget to mark it as answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.