1

I used the below on one website and it returned a perfect result:

looking for key word: Emaar pasted at the end of the query:

library(httr)
library(jsonlite)



query<-"https://www.googleapis.com/customsearch/v1?key=AIzaSyA0KdZHRkAjmoxKL14eEXp2vnI4Yg_po38&cx=006431301429107149113:as7yqcm2qc8&q=Emaar"

result11 <- content(GET(query))
print(result11)
result11_JSON <- toJSON(result11)
result11_JSON <- fromJSON(result11_JSON)
result11_df <- as.data.frame(result11_JSON)

now I want to apply the same function over a data.frame containing key words:

so i did the below testing .csv file:

     Company Name
[1]  ADES International Holding Ltd
[2]  Emirates REIT (CEIC) Limited
[3]  POLARCUS LIMITED

called it Testing Website Extraction.csv

code used:

test_companies <- read.csv("... \\Testing Website Extraction.csv")

#removing space and adding "+" sign then pasting query before it (query already has my unique google key and search engine ID
test_companies$plus <- gsub(" ", "+", test_companies$Company.Name)


query <- "https://www.googleapis.com/customsearch/v1?key=AIzaSyCmD6FRaonSmZWrjwX6JJgYMfDSwlR1z0Y&cx=006431301429107149113:as7yqcm2qc8&q="

test_companies$plus <- paste0(query, test_companies$plus)

a <- test_companies$plus
length(a)
function_webs_search <- function(web_search) {content(GET(web_search))}



result <- lapply(as.character(a), function_webs_search)

Result here shows a list of length 3 (the 3 search terms) and sublist within each term containing: url (list[2]), queries (list[2]), ... items (list[10]) and these are the same for each search term (same length separately), my issue here is applying the remainder of the code

#when i run:
result_JSON <- toJSON(result)
result_JSON <- as.list(fromJSON(result_JSON))

I get a list of 6 list that has sublists

and putting it into a tidy dataframe where the results are listed under each other (not separately) is proving to be difficult

also note that I tried taking from the "result" list that has 3 separate lists in it each one by itself but its a lot of manual labor if I have a longer list of keywords

The expected end result should include 30 observations of 37 variables (for each search term 10 observations of 37 variables and all are underneath each other.

Things I have tried unsuccessfully:

These work to flatten the list:
#do.call(c , result)
#all.equal(listofvectors, res, check.attributes = FALSE)
#unlist(result, recursive = FALSE)
# for (i in 1:length(result))  {listofvectors <- c(listofvectors, result[[i]])}
#rbind()
#rbind.fill()

even after flattening I dont know how to organize them into a tidy final output for a non-R user to interact with.

Any help here would be greatly appreciated,

I am here in case anything is not clear about my question,

Always happy to learn more about R so please bear with me as I am just starting to catch up.

All the best and thanks in advance!

3
  • There's a lot of content from the url call, and when converting to data.frame a lot of items are recycled (shorter length recycled to match longer length). Not sure what you are trying to achieve with what is effectively fromJSON(toJSON(content(GET(ur)))). Due to recycling the data.frame object will have a lot of incorrect information. If you want to extract something specific from the query output, you should analyse the structure a bit more. Commented Aug 13, 2018 at 21:29
  • Hello I am trying to extract the first 10 results of each company and then rbind them all together under one dataframe (30 observations for each company 10 and 37 variables) Commented Aug 17, 2018 at 11:38
  • 1
    Please reduce the problem to a minimal reproducible example. In particular, if the query returns the expected result, it is not the problem and can be replaced with hardcoded data. Further, if the JSON-decoding is not the problem, it can be removed as well. Commented Sep 17, 2018 at 9:14

1 Answer 1

1

Basically what I did is extract only the columns I need from the dataframe list, below is the final code:

library(httr)
library(jsonlite)
library(tidyr)
library(stringr)
library(purrr)
library(plyr)


test_companies <- read.csv("c:\\users\\... Companies Without Websites List.csv")

test_companies$plus <- gsub(" ", "+", test_companies$Company.Name)


query <- "https://www.googleapis.com/customsearch/v1?key=AIzaSyCmD6FRaonSmZWrjwX6JJgYMfDSwlR1z0Y&cx=006431301429107149113:as7yqcm2qc8&q="

test_companies$plus <- paste0(query, test_companies$plus)

a <- test_companies$plus
length(a)
function_webs_search <- function(web_search) {content(GET(web_search))}



result <- lapply(as.character(a), function_webs_search)

function_toJSONall <- function(all) {toJSON(all)}

a <- lapply(result, function_toJSONall)


function_fromJSONall <- function(all) {fromJSON(all)}

b <- lapply(a, function_fromJSONall)


function_dataframe <- function(all) {as.data.frame(all)}

c <- lapply(b, function_dataframe)

function_column <- function(all) {all[ ,15:30]}

result_final <- lapply(c, function_column)

results_df <- rbind.fill(c[])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.