I have downloaded my facebook data. It contains a htm file with all my contacts. I would like to read it in with R, and create a contact.csv.
The usual structure is:
<tr><td>Firstname Lastname</td><td><span class="meta"><ul><li>contact: [email protected]</li><li>contact: +123456789</li></ul></span></td></tr>
but some contacts may miss the phone number
<tr><td>Firstname Lastname</td><td><span class="meta"><ul><li>contact: [email protected]</li></ul></span></td></tr>
while some miss the email
<tr><td>Firstname Lastname</td><td><span class="meta"><ul><li>contact: +123456789</li></ul></span></td></tr>
The csv should have the structure Firstname Lastname; email; tel number
I have tried:
library(rvest)
library(stringr)
html <- read_html("contact_info.htm")
p_nodes <- html %>% html_nodes('tr')
p_nodes_text <- p_nodes %>% html_text()
write.csv(p_nodes_text, "contact.csv")
Which creates me the csv, but unfortunately merges names with "contact:" and does not create separate columns and does not allow to have "NA" for missing either phone numbers or emails.
How could I enhance my code to accomplish this? Thanks