R - Nested list to (wide) dataframe

Question

I currently have the following problem: I extracted some data via the crunchbase API, resulting in a big nested list of the following structure (there are many more nested lists on several instances included, I here only display the part of the structure currently relevant for me):

> str(x[[1]])
$ uuid         : chr "5f9957b0841251e6e439d757XXXXXX"
$ relationships: List of 27
..$ websites: List of 3
.. ..$ cardinality: chr "OneToMany"
.. ..$ items      :'data.frame':    4 obs. of  7 variables:
.. .. ..$ properties.website_type: chr [1:4] "homepage" "facebook" "twitter" "linkedin"
.. .. ..$ properties.url         : chr [1:4] "http://www.example.com" "https://www.facebook.com/example" "http://twitter.com/example" "http://www.linkedin.com/company/example"

Consider the following minimal example:

x <- list()
x[[1]] <- list(uuid = "123", 
           relationships = list(websites = list(items =  list(
                                                properties.website_type = c("homepage", "facebook", "twitter", "linkedin"), 
                                                properties.url = c("www.example1.com", "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com") ) )  ) )
x[[2]] <- list(uuid = "987", 
           relationships = list(websites = list(items =  list(
             properties.website_type = c("homepage", "facebook", "twitter" ), 
             properties.url = c("www.example2.com", "www.fbex2.com", "www.twitterex2.com") ) )  ) )

Now, I would like to create a dataframe with the following column structure:

> x.df
uuid          web.url  web.facebook        web.twitter        web.linkedin
1  123 www.example1.com www.fbex1.com www.twitterex1.com www.linkedinex1.com
2  987 www.example2.com www.fbex2.com www.twitterex2.com                <NA>

Meaning: I would like to have every uuid (a unique firm identifier) in a single column, followed by the urls of the different platforms (fb, twitter...). I tried a lot of different things with a combination of lapply(), spread(), and row_bind(), yet didn't manage to make anything work. Any help on that would be appreciated.

please make a minimal example instead of a 1000-line file to a link that may break at any time. See how to make a reproducible example — Calum You
– Calum You, Commented Jun 25, 2018 at 21:40

Prem · Accepted Answer · 2018-07-03 06:04:51Z

1

dplyr approach could be

library(dplyr)
library(tidyr)

#convert list to dataframe in long format
df <- do.call(rbind, lapply(x, data.frame, stringsAsFactors = FALSE))

#final result
df1 <- df %>%
  spread(relationships.websites.items.properties.website_type, relationships.websites.items.properties.url)

which gives

  uuid      facebook         homepage            linkedin            twitter
1  123 www.fbex1.com www.example1.com www.linkedinex1.com www.twitterex1.com
2  987 www.fbex2.com www.example2.com                <NA> www.twitterex2.com

Sample data:

x <- list(structure(list(uuid = "123", relationships = structure(list(
    websites = structure(list(items = structure(list(properties.website_type = c("homepage", 
    "facebook", "twitter", "linkedin"), properties.url = c("www.example1.com", 
    "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com"
    )), .Names = c("properties.website_type", "properties.url"
    ))), .Names = "items")), .Names = "websites")), .Names = c("uuid", 
"relationships")), structure(list(uuid = "987", relationships = structure(list(
    websites = structure(list(items = structure(list(properties.website_type = c("homepage", 
    "facebook", "twitter"), properties.url = c("www.example2.com", 
    "www.fbex2.com", "www.twitterex2.com")), .Names = c("properties.website_type", 
    "properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid", 
"relationships")))

Update: In order to fix below error

Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0

you would need to remove corrupted elements from input data where website_type has one value but properties.url has NULL. Run this chunk of code as a pre-processing step before executing the main solution:

idx <- which(sapply(x, function(k) is.null(k$relationships$websites$items$properties.url)))
x <- x[-idx]

Sample data to test this pre-processing step:

x <- list(structure(list(uuid = "123", relationships = structure(list(
    websites = structure(list(items = structure(list(properties.website_type = c("homepage", 
    "facebook", "twitter", "linkedin"), properties.url = c("www.example1.com", 
    "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com"
    )), .Names = c("properties.website_type", "properties.url"
    ))), .Names = "items")), .Names = "websites")), .Names = c("uuid", 
"relationships")), structure(list(uuid = "987", relationships = structure(list(
    websites = structure(list(items = structure(list(properties.website_type = "homepage", 
        properties.url = NULL), .Names = c("properties.website_type", 
    "properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid", 
"relationships")), structure(list(uuid = "345", relationships = structure(list(
    websites = structure(list(items = structure(list(properties.website_type = "homepage", 
        properties.url = NULL), .Names = c("properties.website_type", 
    "properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid", 
"relationships")))

edited Jul 3, 2018 at 6:04

answered Jun 26, 2018 at 7:14

Prem

12k1 gold badge21 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Daniel S. Hain Over a year ago

Great, that generally seems to be what I need. Runs perfectly with the example. However, when I try it with my full dataset, I always get an error message: "Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 1, 0" Any idea what the problem could be?

Prem Over a year ago

Probably you have an element in your sample data wherein number of values in $relationships$websites$items$properties.website_type & $relationships$websites$items$properties.url is not matching. Because of this data.frame is throwing this error. So first you need to think on how do you want to handle such cases i.e. website_type is there but url is missing.

Daniel S. Hain Over a year ago

Indeed, on that, you are probably right! I didn\t consider that case. In case the url is missing, it in the optimal case should be an NA.

Prem Over a year ago

I think you are missing a point here. Consider this example and let me know the desired output -

x <- structure(list(uuid = "123", relationships = structure(list(websites = structure(list(     items = structure(list(properties.website_type = c("homepage",      "facebook", "twitter", "linkedin"), properties.url = c("www.example1.com",      "www.fbex1.com", "www.linkedinex1.com")), .Names = c("properties.website_type",      "properties.url"))), .Names = "items")), .Names = "websites")), .Names = c("uuid",  "relationships"))

Here twitter has no url in this example and gives the same error.

Daniel S. Hain Over a year ago

Hello again. Sorry, was in transit for some time and now catching up. I think I understand the problem correctly. To keep it simple, I actually only want to consider the website, facebook, twitter and linkedin, even though there might be actually more types of URLs in that part of the list. The desired output structure then would be exactly like the one I posted in the original question. About what happens when the number of values in url and website.type not match, I am not emotional. I think these cases are very rare, and could also be deleted alltogether.

|

Lulliter · Accepted Answer · 2022-04-26 14:36:01Z

I know this is a clunkier solution, but it helped me seeing the process step by step (running str (x_df) to see each result):

library(tidyverse)

# Using your example
x <- list()
x[[1]] <- list(uuid = "123",
                    relationships = list(websites = list(items =  list(
                        properties.website_type = c("homepage", "facebook", "twitter", "linkedin"),
                        properties.url = c("www.example1.com", "www.fbex1.com", "www.twitterex1.com", "www.linkedinex1.com") ) )  ) )
x[[2]] <- list(uuid = "987",
                    relationships = list(websites = list(items =  list(
                        properties.website_type = c("homepage", "facebook", "twitter" ),
                        properties.url = c("www.example2.com", "www.fbex2.com", "www.twitterex2.com") ) )  ) )

 

# --- Iterations of unnest:
x_df <- x %>% tibble::as_tibble_col( .) %>%  
    tidyr::unnest_wider(col = "value")  %>% 
    tidyr::unnest_longer(col = "relationships")   %>%  
    tidyr::unnest_wider(col = "relationships")  %>%  
    tidyr::unnest_wider(col =  "items")  %>%  
    tidyr::unnest_longer(col = c("properties.website_type", "properties.url")) %>% 
# --- Lastly, group by id: 
    group_by(uuid) %>% 
    tidyr::pivot_wider(data = ., 
                             names_from = properties.website_type, 
                             values_from = c("properties.url"))

Collectives™ on Stack Overflow

R - Nested list to (wide) dataframe

2 Answers 2

10 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

10 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related