1

I want to remove all empty nodes in a XML file. Even if the node is present as

<Node/>    OR    <Node></Node>

node should be deleted from the XML.

<Root type="1">
<A></A>
<B>
    <B1>
        <B12/>
        <B13/>
    </B1>
    <B2>
        123
        <B21></B21>
    </B2>
   <B3 type="3">
       <B4/>
   </B3>
</B>
<C/>
</Root>

Expected output:

<Root type="1">
<B>
    <B2>
        123
    </B2>
    <B3 type="3">
    </B3>
</B>
</Root>

Delete B1 node because all nodes under B1 is empty and also there is no attribute as well.

Do not delete B2 because , B2 has a value 123 , but delete its empty child.

Do not delete B3 because , B3 has an attribute, but delete its empty child.

I am using SQL to do the same , but in case if this can be done in c# as well , I can call C# script from SSIS, but SQL will be preferred.

3 Answers 3

1

It can be done easily with regular expressions:

string xml = @"<Root type=""1"">
                < A ></ A >
                < B >
                    < B1 >
                        < B12 />
                        < B13 />
                    </ B1 >
                    < B2 >
                        123
                        < B21 ></ B21 >
                    </ B2 >
                   < B3 type = ""3"" >

                        < B4 />

                    </ B3 >
                 </ B >
                 < C />
                 </ Root > ";


xml = Regex.Replace(xml, @"<.+?/>", "");
xml = Regex.Replace(xml, @"<(.+?)>\s*</\1>", "");
Sign up to request clarification or add additional context in comments.

2 Comments

Do I need to replace all " with "" ?
Yes, because single " would terminate the string, so inside a string you need to use double ".
1

A way to do in C# would be:

var x = XElement.Parse(@"<Root type=""1"">
                            <A></A>
                            <B>
                                <B1>
                                    <B12/>
                                    <B13/>
                                </B1>
                                <B2>
                                    123
                                    <B21></B21>
                                </B2>
                               <B3 type=""3"">
                                   <B4/>
                               </B3>
                            </B>
                            <C/>
                            </Root>");

foreach(XElement child in x.Descendants().Reverse())
{
    if(!child.HasElements && string.IsNullOrEmpty(child.Value) && !child.HasAttributes) 
        child.Remove();
}

2 Comments

Do I need to replace all " with "" ?
@KMittal yes . .
1

The simplest way to do this in SQL Server .

SET @xml.modify('

delete //*[not(node()) and not(./@*)]

');

SELECT @xml.query('//*[not(node()) and not(./@*)]') 

SET @xml.modify('

delete //*[not(node()) and not(./@*)]

');

SELECT @xml.query('//*[not(node()) and not(./@*)]') 

SET @xml.modify('

delete //*[not(node()) and not(./@*)]

');

SELECT @xml.query('//*[not(node()) and not(./@*)]') 

SET @xml.modify('

delete //*[not(node()) and not(./@*)]

');

SELECT @xml.query('//*[not(node()) and not(./@*)]') 

I am also able to select all the nodes that I ignored/deleted.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.