Parsing incomplete lists into data frames with two different problems

Question

If you request web data through R, you often work with json or xml where the fields are not named if there is no value for them. Sometimes, there isn't even any data and it comes out as an empty list for a certain index. So, I see this as two different problems. I'm proposing the solution I use to solve this as well but I know there are some better ones out there. I have for starters, a very messy and fake list that I created that is missing field names (on purpose from the xml, json spec) AND missing whole indexes (also on purpose).

(messy_list <- list(list(x = 2, y = 3), 
                   list(), 
                   list(y = 4),
                   list(x = 5)))

Now, here is how I break it down to what I would say is "solved".

library(plyr)
messy_list_no_empties <- lapply(messy_list, function(x) if(length(x) == 0) {list(NA, NA)} else x)

ldply(messy_list_no_empties, data.frame)[,1:2]

The end result is what I am looking for but I would like to find a more elegant way to deal with this problem.

Do you really want to keep rows that are entirely NA?

Mike H.
– Mike H.

2017-05-03 21:22:27 +00:00
Commented May 3, 2017 at 21:22 — Mike H.
– Mike H., Commented May 3, 2017 at 21:22
Yeah. I need them in the same index.

cylondude
– cylondude

2017-05-03 21:23:32 +00:00
Commented May 3, 2017 at 21:23 — cylondude
– cylondude, Commented May 3, 2017 at 21:23

alistaire · Accepted Answer · 2017-05-03 21:33:30Z

2

With purrr::map_df,

library(purrr)

messy_list <- list(list(x = 2, y = 3), 
                   list(), 
                   list(y = 4),
                   list(x = 5))

messy_list %>% map_df(~list(x = .x$x %||% NA, 
                            y = .x$y %||% NA))
#> # A tibble: 4 × 2
#>       x     y
#>   <dbl> <dbl>
#> 1     2     3
#> 2    NA    NA
#> 3    NA     4
#> 4     5    NA

map_df iterates over the list like lapply and coerces the results to a data.frame. The function (in purrr's formula form) assembles a list with an x and a y element, looking for existing values if they're there. If they're not, the subsetting will return NULL, which %||% will replace with the value after it, NA.

In mostly-equivalent base R,

as.data.frame(do.call(rbind, 
                      lapply(messy_list, function(.x){
                          list(x = ifelse(is.null(.x$x), NA, .x$x), 
                               y = ifelse(is.null(.x$y), NA, .x$y))
                      })))
#>    x  y
#> 1  2  3
#> 2 NA NA
#> 3 NA  4
#> 4  5 NA

Note the base approach won't handle different types well. To do so, coerce everything to character (rbind probably will anyway, so just add stringsAsFactors = FALSE to as.data.frame) and lapply type.convert.

edited May 3, 2017 at 21:33

answered May 3, 2017 at 21:26

alistaire

43.5k4 gold badges80 silver badges119 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

cylondude Over a year ago

This looks about exactly what I was looking for. I'll wait a couple hours and accept your answer if a better one doesn't show up.

IRTFM Over a year ago

No need to wait. Mark it with checkmark and later you can still change it.

Mike H. · Accepted Answer · 2017-05-03 21:28:00Z

2

Your method is already pretty compact, but if you're looking for other methods, one way might be to use rbindlist from data.table:

library(data.table)
new_list <- lapply(messy_list, function(x) if(identical(x,list())){list(x = NA)} else {x})

rbindlist(new_list, fill = T, use.names = T)
#    x  y
#1:  2  3
#2: NA NA
#3: NA  4
#4:  5 NA

Note we need the lapply so it doesn't drop the rows that are empty

answered May 3, 2017 at 21:28

Mike H.

14.4k2 gold badges33 silver badges39 bronze badges

Collectives™ on Stack Overflow

Parsing incomplete lists into data frames with two different problems

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related