3

I have a html to parse(see below)

<div id="mailbox" class="div-w div-m-0">
    <h2 class="h-line">InBox</h2>
    <div id="mailbox-table">
        <table id="maillist">
            <tr>
                <th>From</th>
                <th>Subject</th>
                <th>Date</th>
            </tr>
            <tr onclick="location='readmail.html?mid=welcome'" style="font-weight: bold;">
                <td>[email protected]</td>
                <td>
                    <a href="readmail.html?mid=welcome">Hi, Welcome</a>
                </td>
                <td>
                    <span title="2016-02-16 13:23:50 UTC">just now</span>
                </td>
            </tr>
            <tr onclick="location='readmail.html?mid=T0wM6P'" style="font-weight: bold;">
                <td>[email protected]</td>
                <td>
                    <a href="readmail.html?mid=T0wM6P">sa</a>
                </td>
                <td>
                    <span title="2016-02-16 13:24:04">just now</span>
                </td>
            </tr>
        </table>
    </div>
</div>

I need to parse links in <tr onclick= tags and email addresses in <td> tags.

So far i manged to get first occurance of email/link from my html.

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(responseFromServer);

Could someone show me how is it properly done? Basically what i want to do is take all email addresses and links from html that are in said tags.

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//tr[@onclick]"))
{
    HtmlAttribute att = link.Attributes["onclick"];
    Console.WriteLine(att.Value);
}

EDIT: I need to store parsed values in a class (list) in pairs. Email (link) and senders Email.

public class ClassMailBox
{
    public string From { get; set; } 
    public string LinkToMail { get; set; }    

}
6
  • I've also tried HtmlAgilityPack but it doesn't support XPath well. Commented Feb 16, 2016 at 14:18
  • Did you try CssPath feature ? Commented Feb 16, 2016 at 14:18
  • 1
    @Tagyoureit I tried your code and it prints both tr items: location='readmail.html?mid=welcome' location='readmail.html?mid=T0wM6P' I'm using .NET 4.5 and the HtmlAgilityPack 1.4.9. Can you please check that the html you get in the responseFromServer variable is complete. Thanks Commented Feb 16, 2016 at 14:23
  • Yes you are correct i was parsing outdated HTML. Next question is how to get sender email address? Commented Feb 16, 2016 at 14:38
  • 1
    OK, I was able to get the emails by creating a second Xpath containing the first td child, do you want to Xpath for td and tr on the same query or you prefer doing a xpath for queries and another one for td which I recommend. Commented Feb 16, 2016 at 14:40

1 Answer 1

2

You can write the following code:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(responseFromServer);

foreach (HtmlNode link in doc.DocumentNode.SelectNodes("//tr[@onclick]"))
{
    HtmlAttribute att = link.Attributes["onclick"];
    ClassMailBox classMailbox = new ClassMailBox() { LinkToMail = att.Value };
    classMailBoxes.Add(classMailbox);
}

int currentPosition = 0;

foreach (HtmlNode tableDef in doc.DocumentNode.SelectNodes("//tr[@onclick]/td[1]"))
{
    classMailBoxes[currentPosition].From = tableDef.InnerText;
    currentPosition++;
}

To keep this code simple, I'm assuming some things:

  1. The email is always on the first td inside the tr which contains an onlink property
  2. Every tr with an onlink attribute contains an email

If those conditions don't apply this code won't work and it could throw some exceptions (IndexOutOfRangeExceptions) or it could match links with wrong email addresses.

Sign up to request clarification or add additional context in comments.

1 Comment

Yes it works like a charm. Thank you for you time! And your assumptions are correct (1 & 2).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.