0

I am getting this Html string from system

<h1>PDF Attachment</h1>
<h1 style="color: rgb(51, 51, 51); text-align: center;">
  <p style="font-size: 14px; font-weight: 400; text-align: justify;">Your service detail are following</p><p style="font-size: 14px; font-weight: 400; text-align: justify;">
    <table>
      <tr><td></td></tr>
    </table>&nbsp;
    <br>
 </p>
 <p style="font-size: 14px; font-weight: 400; text-align: justify;">  
   <br>
 </p>
</h1>

I have two h1 tags in this string. I want to remove "h1" tag in which "table" tag is used.

How can i remove it programmatically?

1
  • Probably off topic, but I must warn you that due to the errors in this HTML, the table would not be inside the paragraph when you view this in a browser. Commented Jun 8, 2020 at 6:26

1 Answer 1

1

You can use HtmlAgilityPack:

var content = @"<h1>PDF Attachment</h1><h1 style=""color: rgb(51, 51, 51); text-align: center;""><p style=""font-size: 14px; font-weight: 400; text-align: justify;"">Your service detail are following</p><p style=""font-size: 14px; font-weight: 400; text-align: justify;""><table><tr><td></td></tr></table>&nbsp;<br></p><p style=""font-size: 14px; font-weight: 400; text-align: justify;""><br></p></h1>";

HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(content);
var h1NeedsToRemove = htmlDoc.DocumentNode.SelectNodes("/h1").Where(i => i.ChildNodes.Any(c => c.Name == "table")).FirstOrDefault();
var childNodesOfH1 = h1NeedsToRemove.ChildNodes;
h1NeedsToRemove.Remove();

htmlDoc.DocumentNode.AppendChildren(childNodesOfH1);

It will give you desired output:

<h1>PDF Attachment</h1>
<p style="font-size: 14px; font-weight: 400; text-align: justify;">Your service detail are following</p><p style="font-size: 14px; font-weight: 400; text-align: justify;">
    <table>
      <tr><td></td></tr>
    </table>&nbsp;
    <br>
 </p>
 <p style="font-size: 14px; font-weight: 400; text-align: justify;">  
   <br>
 </p>
Sign up to request clarification or add additional context in comments.

5 Comments

but i need this output <h1>PDF Attachment</h1> <p style="font-size: 14px; font-weight: 400; text-align: justify;">Your service detail are following</p><p style="font-size: 14px; font-weight: 400; text-align: justify;"> <table> <tr><td></td></tr> </table>&nbsp; <br> </p> <p style="font-size: 14px; font-weight: 400; text-align: justify;"> <br> </p>
Yes it worked. Thanks :) Just want to ask one more thing that how can i select multiple nodes like h1 or h2?
You can use ToList() instead of FirstOrDefault().
It removed closing tag of "p" automatically :(
i want to ask like this "htmlDoc.DocumentNode.SelectNodes("/h1,/h2")". i want both h1 and h2 nodes.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.