I'm trying to get out what's written in comment of following HTML code snippet, this is only a part of that code:
<table id="datalist1" cellspacing="0" border="0" style="border-width:1px;border-style:solid;width:100%;border-collapse:collapse;">
<tr>
<td style="font-size:7pt;">
<table width="100%" border="0" cellspacing="0" cellpadding="0">
<tr align="left">
<td width="50%" class="subhead1">
<!-- <b>IE CODE : 0514026049</b> --> ' I want text inside this comment
</td>
<td rowspan="9" valign="top">
<span id="datalist1_ctl00_lbl_p"></span>
</td>
</tr>
I am trying the following approach
1) Get Xpath of element.
2) Read Web_page
3) Go to comment node
4) extract text in comment
library(rvest)
library(xml2)
url <- 'http://agriexchange.apeda.gov.in/ExportersDirectory/exporters_list.aspx?letter=Z'
webpage <- read_html(url)
' Xpath of comment element I want to grab
//*[@id="datalist1"]/tbody/tr[1]/td/table/tbody/tr[1]/td[1]/comment()
webpage %>%
html_nodes(xpath='//*[@id="datalist1"]/tbody/tr[1]/td/table/tbody/tr[1]/td[1]/comment()')%>%html_text()
#character(0) ' this is output
But the above code gives out an empty character string. Since I have never used Xpath, I don't understand if this is even correct way to go about it.
I'll have to run this for all comment elements. I guess in short my question is How to extract comments in HTML code ?
tbodyfrom XPath (/table/tbody/tr[1]-->/table//tr[1]) as it can be added to DOM by browsertbodywasn't there. I'll try to use it without tbody