3

I am trying to scrape some data from this link.

After sequentially selecting options in the three drop down menus - "Crop group", "Crop" and "Variety name" and then with the button "Show features", there is the option to export to csv. I am trying to download all such csv files for a crop.

I am able to extract all the options in the first drop down as follows.

library(rvest)
library(httr)
library(tidyverse)

pg <- read_html("http://seednet.gov.in/SeedVarieties/Varietydetail.aspx")

cropgp_nodes <- html_nodes(pg, "select[id='_ctl0_ContentPlaceHolder1_ddlgroup'] option")
crpgps <- data_frame(crpgp = html_text(cropgp_nodes),
                     value = html_attr(cropgp_nodes, "value"))
crpgps
# A tibble: 24 x 2
   crpgp                  value                
   <chr>                  <chr>                
 1 --Select Crop Group--  --Select Crop Group--
 2 CEREALS                A01                  
 3 MILLETS                A02                  
 4 PULSES                 A03                  
 5 OILSEEDS               A04                  
 6 FIBRE CROPS            A05                  
 7 FORAGE CROPS           A06                  
 8 SUGAR CROPS            A07                  
 9 STARCH CROPS           A08                  
10 NARCOTICS(OTHER CROPS) A09                  
# ... with 14 more rows

However as it is sequential, I am not able to get the options for next one.

html_nodes(pg, "select[id='_ctl0_ContentPlaceHolder1_ddlCrop'] option")
{xml_nodeset (0)}

How to scrape the data in this case?

1 Answer 1

1

One option is using the RSelenium to start the 'Selenium' server

library(RSelenium)
library(XML)

-connect with the selenium driver

rD <- rsDriver()
remDr <- rD[["client"]]
remDr$navigate("http://seednet.gov.in/SeedVarieties/Varietydetail.aspx")

-loop through the 'crpgp' already extracted and use it to the send the keys to extract the corresponding 'crop' in a loop

v1 <- crpgps$crpgp[-1]
lst <- vector("list", length(v1))
for(i in seq_along(lst)) {
remDr$findElement("id", "_ctl0_ContentPlaceHolder1_ddlgroup")$sendKeysToElement(list(v1[i]))
elem <- remDr$findElement(using="id", value='_ctl0_ContentPlaceHolder1_ddlCrop')
elemtxt <-  elem$getElementAttribute("outerHTML")[[1]] 
elemxml <- htmlTreeParse(elemtxt, useInternalNodes=TRUE) 
 key <- xpathSApply(elemxml, "//body//option", xmlValue)[-1]

 value <- unlist(xpathSApply(elemxml, "//body//option", xmlAttrs)[-1])
if(length(value)==1 &  "--Select Crop--" %in% value) {
   lst[[i]] <- NULL
  } else  lst[[i]] <- data.frame(key, value, stringsAsFactors = FALSE)
}

res <- do.call(rbind, lst)

-output

dim(res)
#[1] 181  2
head(res)
#                                   key value
#1                         BARLEY (JAU) A0101
#2                         PADDY (DHAN) A0102
#3                            TRITICALE A0103
#4                        WHEAT (GEHON) A0104
#5 BANYARD MILLET (KUNDIRAIVALLI/SAWAN) A0201
#6                  BUCK WHEAT (KASPAT) A0202

-close the connection and stop the server afterwards

remDr$close()
rD[["server"]]$stop() 
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.