This comes up a lot when dealing with API's.
Most of the time, to do real analysis, I'd like to get my dataset tidy, but typically, this requires a solution for each type of tree, rather than something more general.
I figured it would be nice to have one function that generates tidy data (albeit with a ton of NA's in deeply nested trees with many different factor levels.
I have a hackish solution which follows, using unlist(..., recursive = FALSE) + a naming convention,
But I'd like to see if someone here might have a better solution to tidy these kinds of list structures.
#####################
# Some Test Data
aNestedTree =
list(a = 1,
b = 2,
c = list(
a = list(1:5),
b = 2,
c = list(
a = 1,
d = 3,
e = list())),
d = list(
y = 3,
z = 2
))
############################################################
# Run through the list and rename all list elements,
# We unlist once at time, adding "__" at each unlist step
# until the object is no longer a list
renameVars <- function(lst, sep = '__') {
if(is.list(lst)) {
names(lst) <- paste0(names(lst),sep)
renameVars(unlist(lst, recursive = FALSE),sep = sep)
} else {
lst
}
}
res <- renameVars(aNestedTree)
We can check the output and see that we have a strangely named object, But there's a method to this madness.
> res
a________ b________ c__.a____1__ c__.a____2__ c__.a____3__
1 2 1 2 3
c__.a____4__ c__.a____5__ c__.b______ c__.c__.a____ c__.c__.d____
4 5 2 1 3
d__.y______ d__.z______
3 2
Now I put this in a data.table, so I can shape it.
library(data.table)
dt <- data.table(values = res, name = names(res))
# Use some regex to split that name up, along with data.table's tstrsplit
# function to separate them into as many columns as there are nests
> dt[,paste0('V',seq_along(s <- tstrsplit(dt$name,'[__]+(\\.|)'))) := s]
> dt
values name V1 V2 V3
1: 1 a________ a NA NA
2: 2 b________ b NA NA
3: 1 c__.a____1__ c a 1
4: 2 c__.a____2__ c a 2
5: 3 c__.a____3__ c a 3
6: 4 c__.a____4__ c a 4
7: 5 c__.a____5__ c a 5
8: 2 c__.b______ c b NA
9: 1 c__.c__.a____ c c a
10: 3 c__.c__.d____ c c d
11: 3 d__.y______ d y NA
12: 2 d__.z______ d z NA
I can then filter for the factor combinations that I want (Or dcast/spread). (Though I'm effectively breaking apart tables at the lowest level if they exist)
I thought about going through bind.c and pulling out the do_unlistto make a function with a flexible naming convention via Rcpp, but my C++ is rusty, so I figured I'd post here before I do anything drastic.
data.tree? data.tree intro data.tree application And this question