Please help me.
I am new to web scraping in R. I want to collect the link download data tables on this page (http://burkinafaso.opendataforafrica.org/). My project is to make these data more accessible.
Here is the website : http://burkinafaso.opendataforafrica.org/
In the page Donnée I have a list of the sectors. Agriculture: 43 tables Public Help: 7 tables ...
When I click on Agriculture I get the dataset list. https://drive.google.com/open?id=1cInWz62HjbcpgJ00rK-8Q-0p71mC59hq
- I want to get the list of these titles.
- For each title get the download link of the dataset.
I tried this code below to see the structure of the site. But I do not see the architecture that can allow me to extract these links.
library(RCurl)
library(XML)
library(rvest)
URL <- "http://burkinafaso.opendataforafrica.org/"
pg <- read_html(URL)
p <- html_children(pg)[1]
pp <- html_children(pg)[2]
html_structure(p)
html_structure(pp)
library(RCurl)
library(XML)
library(rvest)
URL <- "http://burkinafaso.opendataforafrica.org/data/#topic=Agriculture"
pg <- read_html(URL)
p <- html_children(pg)[1]
pp <- html_children(pg)[2]
html_structure(p)
html_structure(pp)
For example, I tried this code for links in tags. But I do not get the differents download links.
URL <- "http://burkinafaso.opendataforafrica.org/data/#topic=Agriculture"
pg <- read_html(URL)
all.url <- html_attr(html_nodes(pg, "a"), "href")
all.url <- as.data.frame(all.url)
As results I expect, For each itm the list of tables and download links. For example:
for Public Aid (7):
label links
Aide extérieure par secteur de 1995 à 2006 (en millions de FCFA) download links Aide extérieure par type (en millions de FCFA) download links
Please help me.



