0

I have the following scenario:

<a href="test.com">Some text <b>is bolded</b> some is <b>not</b></a>

Now, how do I get the "test.com" part and the anchor of the text, without having the bolded parts?

2
  • Are you looking to extract "Some text is bolded some is not" (the text of the anchor without formatting markup) or "Some text some is" (content within markup removed)? Commented Sep 22, 2011 at 20:51
  • I need to extract the link within the anchor and the whole text without formatting mark-up. That would be "Some text is bolded some is not". Commented Sep 22, 2011 at 20:55

1 Answer 1

2

Assuming the following markup:

<html>
<head>
    <title>Test</title>
</head>
<body>
    <a href="test.com">Some text <b>is bolded</b> some is <b>not</b></a>
</body>
</html>

You could perform the following:

class Program
{
    static void Main()
    {
        var doc = new HtmlDocument();
        doc.Load("test.html");
        var anchor = doc.DocumentNode.SelectSingleNode("//a");
        Console.WriteLine(anchor.Attributes["href"].Value);
        Console.WriteLine(anchor.InnerText);
    }
}

prints:

test.com
Some text is bolded some is not

Of course you probably wanna adjust your SelectSingleNode XPath selector by providing an unique id or a classname to the anchor you are trying to fetch:

// assuming <a href="test.com" id="foo">Some text <b>is bolded</b> some is <b>not</b></a>
var anchor = doc.GetElementbyId("foo");
Sign up to request clarification or add additional context in comments.

1 Comment

Exactly what I needed. I did a bit of a hack on the HTML to get the text I wanted. I stripped out the link and then I gave it to the LoadHTML method of the HtmlDocument. It did the trick. Unfortunately I couldn't use the GetElementbyId, so the hack did its job OK.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.