8

I have a nested list; for some indices, some variables are missing.

[[1]]
    sk   ques   pval 
  "10" "sfsf" "0.05" 

[[2]]
    sk   ques   pval   diff 
 "24" "wwww" "0.11"  "0.3" 

[[3]]
    sk   ques   pval   diff    imp 
  "24" "wwww" "0.11"  "0.3"    "2" 

How can I convert this to data frame, where for the first row, data$diff[1] = NA? Above case will be data frame with 5 variables and 3 observations.

The number of variables in the data frame will be number of unique names in list elements, and missing values inside the list will be replaced with NA's.

Thank you,

EDIT : Data format

list(structure(c("10", "sfsf", "0.05"), .Names = c("sk", "ques", 
"pval")), structure(c("24", "wwww", "0.11", "0.3"), .Names = c("sk", 
"ques", "pval", "diff")), structure(c("24", "wwww", "0.11", "0.3", 
"2"), .Names = c("sk", "ques", "pval", "diff", "imp")))
2
  • Inside each list element, are those vectors or data frames? They seem like named vectors. Could you please post the output of dput(head(list, 3)) Commented Nov 26, 2014 at 16:21
  • Good catch @RichardScriven. I had assumed they were proper data.frames. You can still use rbind.fill if you do the conversion: rbind.fill(lapply(mydata, function(x)as.data.frame(t(x)))) Commented Nov 26, 2014 at 16:32

1 Answer 1

23

We get the length of list element ('indx') by looping with sapply. In the recent version of R, we can use lengths to replace the sapply(.., length) step. We change the length of each element to the max length from the 'indx' (length<-) and thereby pad NA values at the end of the list elements with length less than the max length. We can rbind the list elements, convert to data.frame and change the column names.

 indx <- sapply(lst, length)
 #indx <- lengths(lst) 
 res <- as.data.frame(do.call(rbind,lapply(lst, `length<-`,
                          max(indx))))

 colnames(res) <- names(lst[[which.max(indx)]])
 res
 # sk ques pval diff  imp
 #1 10 sfsf 0.05 <NA> <NA>
 #2 24 wwww 0.11  0.3 <NA>
 #3 24 wwww 0.11  0.3    2

data

 lst <- list(structure(c("10", "sfsf", "0.05"), .Names = c("sk", "ques", 
 "pval")), structure(c("24", "wwww", "0.11", "0.3"), .Names = c("sk", 
 "ques", "pval", "diff")), structure(c("24", "wwww", "0.11", "0.3", 
 "2"), .Names = c("sk", "ques", "pval", "diff", "imp")))
Sign up to request clarification or add additional context in comments.

6 Comments

When I try this solution, I get Error in row.names<-.data.frame(*tmp*, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique value when setting 'row.names': ‘1’ at the point where res is defined. I'm not sure why that is happening.
@jessi If you are assigning duplicated row names to data.frame it will not work as data.frame can take only unique row names, but the duplicate row names for matrix is okay.i.e without the as.data.frame
@akrun, FYI tested this today, my resulting data frame is given colnames without calling colnames(res) <- names(lst([[which.max(indx)]])
@MartinSöderström Based on the example in my post, before calling the last line of code, 'V4' and 'V5' are the column names for the 4th and 5th column which is changed with colnames(res) <-
@akrun What if it is nested list? [[1]] sk ques pval [[act]] "10" "sfsf" "0.05" "time"
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.