0

I'm trying to get innertext in this site http://www.hurriyet.com.tr/yazarlar/22933964.asp

with htmlagilitypack. html structure is

<div class="detailText">
<span class="yzrArticleDate">30 Mart 2014</span>
<h1 class="yazarArticleTitle">31 Mart sabahı için acil ihtiyaç listesi</h1>
<p></p><p><p  >Akıl.<br  />Sağduyu.<br  />Barış.<br  />
Özgürlük.<br  />Kardeşlik.<br  />Vicdan.<br  />Huzur.............

and my current code

string htmlContent = getsource(s);
HtmlAgilityPack.HtmlDocument document = new HtmlAgilityPack.HtmlDocument();
document.LoadHtml(htmlContent);
var noa =document.DocumentNode.SelectSingleNode("*//div[@class='detailText']").InnerText;

problem is it gets with the heading and date. I mean with "30 Mart 2014" and "31 Mart sabahı için acil ihtiyaç listesi".

I want the part which begins with

<*p><*/p><*p><p*  >Akıl.<*br "

I tried different variation

var noa =document.DocumentNode.SelectSingleNode("*//div[@class='detailText']").InnerHtml;     
var noa = document.DocumentNode.SelectSingleNode("*//div[@class='detailText']").NextSibling.NextSibling.InnerText;
var noa = document.DocumentNode.SelectSingleNode("*//div[@class='detailText']").LastSibling.InnerText;

my second question ; if I manage to text this text I ll be faced a character encoding problem, how can I fix this

1 Answer 1

0

The easiest solution would be to remove nodes you don't want and than get InnerHtml/InnerText as covered in remove html node from htmldocument :HTMLAgilityPack.

var noa =document.DocumentNode.SelectSingleNode("*//div[@class='detailText']")
noa.RemoveChild(noa.SelectSingleNode("span")); 
// remove the rest too...
var result = noa.InnerText;

There should be no encoding problem unless site reports invalid encoding as C# strings are Unicode (UTF16).

Sign up to request clarification or add additional context in comments.

1 Comment

sadly there is. i.imgur.com/ZzuTr11.png . is it about the method that I try to get page source ? i.imgur.com/JzllHsU.png

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.