1

I have a very messy JSON file (lists inside of lists) that I'm trying to convert to an R dataframe (part of the reason to convert the file is that I need to export it into a .csv file). Here is a sample of the data (https://www.dropbox.com/s/ikb4znhpaavyc9z/20140909-20141010_10zdfxhqf0_2014_10_09_23_50_activities.json?dl=0). I tried this solution (Parse nested JSON to Data Frame in R), but that got rid of many of my columns. Below is the code I have so far:

    library("twitteR")
    library ("streamR")
    library("rjson")
    json_file <- "20140909-20141010_10zdfxhqf0_2014_09_09_01_00_activities.json"
    json_data <- fromJSON(file=json_file) #convert to r list 
    str (json_data) #list of 16 objects 

    #unlist elements
    tweets.i <- lapply(json_data, function(x){ unlist(x)})
    tweets <- do.call("rbind", tweets.i)
    tweets <- as.data.frame(tweets)

    library(plyr)
    tweets <- rbind.fill(lapply(tweets.i, 
                           function(x) do.call("data.frame", as.list(x))
    ))

Anyone have a way to convert the file to an R dataframe without losing all the info? I'm open to using Python to do this work to, I just don't have the expertise to figure out how to code it.

2 Answers 2

2

This is not very efficient, but it may work for you:

download.file("https://www.dropbox.com/s/ikb4znhpaavyc9z/20140909-20141010_10zdfxhqf0_2014_10_09_23_50_activities.json?dl=1", destfile = tf <- tempfile(fileext = ".json"))
txt <- readLines(tf)
library(jsonlite)
library(plyr)
df <- do.call(plyr::rbind.fill, lapply(txt[txt != ""], function(x) as.data.frame(t(unlist(fromJSON(x))))))
Sign up to request clarification or add additional context in comments.

Comments

2

I like the answer provided above as a really quick way to get everything. You could try tidyjson, but it also will not be efficient since it requires pre-knowledge of the structure. listviewer::jsonedit might help visualize what you are working with.

#devtools::install_github("timelyportfolio/listviewer")
library(listviewer)
jsonedit(readLines(
   "https://www.dropbox.com/s/ikb4znhpaavyc9z/20140909-20141010_10zdfxhqf0_2014_10_09_23_50_activities.json?dl=1"
 )[2])

Perhaps a data.frame really isn't the best structure, but it really depends on what you are trying to accomplish.

This is just a sample to hopefully show you how it might look.

library(tidyjson)
library(dplyr)

json <- readLines(
  "https://www.dropbox.com/s/ikb4znhpaavyc9z/20140909-20141010_10zdfxhqf0_2014_10_09_23_50_activities.json?dl=1"
)

json %>%
  {
    Filter(
      function(x){return (nchar(x) != 0)}
      ,.
    )
  } %>%  
  as.tbl_json() %>%
  spread_values(
    id = jstring("id")
    ,objectType = jstring("objectType")
    ,link = jstring("link")
    ,body = jstring("body")
    ,favoritesCount = jstring("favoritesCount")
    ,twitter_filter_level = jstring("twitter_filter_level")
    ,twitter_lang = jstring("twitter_lang")
    ,retweetCount = jnumber("retweetCount")
    ,verb = jstring("verb")
    ,postedTime = jstring("postedTime")    
    # from actor object in the JSON
    ,actor_objectType = jstring("actor","objectType")
    ,actor_id = jstring("actor","id")
    ,actor_link = jstring("actor","link")
    ,actor_displayName = jstring("actor","displayName")
    ,actor_image = jstring("actor","image")
    ,actor_summary = jstring("actor","summary")
    ,actor_friendsCount = jnumber("actor","friendsCount")
    ,actor_followersCount = jnumber("actor","followersCount")
  ) %>%
  # careful once you enter you can't go back up
  enter_object("actor","links") %>%
  gather_array( ) %>%
  spread_values(
    actor_href = jstring("href")
  )

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.