2

If you request web data through R, you often work with json or xml where the fields are not named if there is no value for them. Sometimes, there isn't even any data and it comes out as an empty list for a certain index. So, I see this as two different problems. I'm proposing the solution I use to solve this as well but I know there are some better ones out there. I have for starters, a very messy and fake list that I created that is missing field names (on purpose from the xml, json spec) AND missing whole indexes (also on purpose).

(messy_list <- list(list(x = 2, y = 3), 
                   list(), 
                   list(y = 4),
                   list(x = 5)))

Now, here is how I break it down to what I would say is "solved".

library(plyr)
messy_list_no_empties <- lapply(messy_list, function(x) if(length(x) == 0) {list(NA, NA)} else x)

ldply(messy_list_no_empties, data.frame)[,1:2]

The end result is what I am looking for but I would like to find a more elegant way to deal with this problem.

2
  • Do you really want to keep rows that are entirely NA? Commented May 3, 2017 at 21:22
  • Yeah. I need them in the same index. Commented May 3, 2017 at 21:23

2 Answers 2

2

With purrr::map_df,

library(purrr)

messy_list <- list(list(x = 2, y = 3), 
                   list(), 
                   list(y = 4),
                   list(x = 5))

messy_list %>% map_df(~list(x = .x$x %||% NA, 
                            y = .x$y %||% NA))
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     2     3
#> 2    NA    NA
#> 3    NA     4
#> 4     5    NA

map_df iterates over the list like lapply and coerces the results to a data.frame. The function (in purrr's formula form) assembles a list with an x and a y element, looking for existing values if they're there. If they're not, the subsetting will return NULL, which %||% will replace with the value after it, NA.

In mostly-equivalent base R,

as.data.frame(do.call(rbind, 
                      lapply(messy_list, function(.x){
                          list(x = ifelse(is.null(.x$x), NA, .x$x), 
                               y = ifelse(is.null(.x$y), NA, .x$y))
                      })))
#>    x  y
#> 1  2  3
#> 2 NA NA
#> 3 NA  4
#> 4  5 NA

Note the base approach won't handle different types well. To do so, coerce everything to character (rbind probably will anyway, so just add stringsAsFactors = FALSE to as.data.frame) and lapply type.convert.

Sign up to request clarification or add additional context in comments.

2 Comments

This looks about exactly what I was looking for. I'll wait a couple hours and accept your answer if a better one doesn't show up.
No need to wait. Mark it with checkmark and later you can still change it.
2

Your method is already pretty compact, but if you're looking for other methods, one way might be to use rbindlist from data.table:

library(data.table)
new_list <- lapply(messy_list, function(x) if(identical(x,list())){list(x = NA)} else {x})

rbindlist(new_list, fill = T, use.names = T)
#    x  y
#1:  2  3
#2: NA NA
#3: NA  4
#4:  5 NA

Note we need the lapply so it doesn't drop the rows that are empty

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.