1

I'm reading HTML with the purpose of extracting only the contents of <body> from it.

The following markup is generated by a DevExpress RichEditControl

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /><title>
        </title>
        <style type="text/css">
            .cs95E872D0{text-align:left;text-indent:0pt;margin:0pt 0pt 0pt 0pt}
            .csCF6BBF71{color:#000000;background-color:transparent;font-family:Times New Roman;font-size:12pt;font-weight:normal;font-style:normal;}
        </style>
    </head>
    <body>
        <p class="cs95E872D0"><span class="csCF6BBF71">Content goes here</span></p></body>
</html>

Following the example from this answer on how to read the document, I wrote the following function:

private string ParseHtml(string html)
{
    XDocument doc = XDocument.Parse(html);
    return doc.Elements("html").Single().Element("body").Value;
}

Seems like it should work in theory but in practice, the LINQ query returns no results for .Elements("html")

Am I way off the mark here? How can I read the html document and extract what I need?

1 Answer 1

1

Probably is because you need to add the namespace:

 private string ParseHtml(string html)
 {
    XNamespace xmlns= "http://www.w3.org/1999/xhtml";

    XDocument doc = XDocument.Parse(html);
    return doc.Element(xmlns+"html").Element(xmlns+"body").Value;
 }

Or:

return doc.Descendants(xmlns+"body").Single().Value;

Also a good way to parse an html is using HTML Agility Pack

Sign up to request clarification or add additional context in comments.

1 Comment

Just to add XNamespace has a method GetName for this and XName has a get method for this as well. And rather then hard coding the namespace doc.Root.GetDefaultNamespace(); will get you the "w3.org/1999/xhtml" and it will work if you have no namespace in the element.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.