I am using regex to parse HTML but some article says that HTMLAgilityPack is much easier.The big question for me is how to parse html for this sample (twitter):
This the HTML code:
<p class="js-tweet-text tweet-text"> What an awesome day! Adventure nanaman kahapon <a href="http" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr"><s>#</s><b><strong>ondoy</strong></b></a> <a href="https://twitter.com/search?q=%23eurotel&src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr"><s>#</s><b>eurotel</b></a> <a href="https://twitter.com/search?q=%23retail&src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr"><s>#</s><b>retail</b></a> <a href="https://twitter.com/search?q=%23family&src=hash" data-query-source="hashtag_click" class="twitter-hashtag pretty-link js-nav" dir="ltr"><s>#</s><b>family</b></a></p>
and I want it to output like this:
"What an awesome day! Adventure nanaman kahapon #ondoy #eurotel #retail #family"
How do I parse that html code. I am using regex now but it displays other tags like href.
this is my regex code.
WebClient web = new WebClient();
string html = web.DownloadString(filename);
MatchCollection m1 = Regex.Matches(html, "<p class=\"js-tweet-text tweet-text\">\\s*(.+?)\\s*</p>", RegexOptions.Singleline);
foreach (Match m in m1)
{
MessageBox.Show(m.Groups[1].Value);
}