R: Creating a data frame from list with missing values.

Question

I have a list here that looks like this:

head(h)
[[1]]
[1] "gene=dnaA"             "locus_tag=CD630_00010" "location=1..1320"     

[[2]]
character(0)

[[3]]
[1] "locus_tag=CD630_05950"   "location=719777..720313"

[[4]]
[1] "gene=dnrA"             "locus_tag=CD630_00010" "location=50..1320"

I'm having trouble trying to manipulate this list to create a data.frame with three columns. For the rows with missing gene info, I want to list them as "gene=unnamed" and completely remove the empty rows into a matrix as shown:

     [,1]        [,2]                    [,3]                             
[1,] "gene=dnaA" "locus_tag=CD630_00010" "location=1..1320"              
[2,] "gene=thrA" "locus_tag=CD630_05950" "location=719777..720313"             
[3,] "gene=dnrA" "locus_tag=CD630_00010" "location=50..1320"

This is what I have right now, but I get an error about missing values in the gene column. Any suggestions?

  h <- data.frame(h[lapply(h,length)>0])
  h <- t(h)
  rownames(h) <- NULL

mpalanco · Accepted Answer · 2015-07-23 06:53:10Z

1

# Data

l <- list(c("gene=dnaA","locus_tag=CD630_00010", "location=1..1320"),
character(0), c("locusc_tag=CD630_05950", "location=719777..720313"),
c("gene=dnrA","locus_tag=CD630_00010" ,"location=50..1320" ))

# Manipulation

n <- sapply(l, length)
seq.max <- seq_len(max(n))
df <-  t(sapply(l, "[", i = seq.max))
df <- t(apply(df,1,function(x){
  c(x[is.na(x)],x[!is.na(x)])}))
df <- df[rowSums(!is.na(df))>0, ]     
df[is.na(df)] <- "gen=unnamed"

Output:

     [,1]          [,2]                     [,3]                     
[1,] "gene=dnaA"   "locus_tag=CD630_00010"  "location=1..1320"       
[2,] "gen=unnamed" "locusc_tag=CD630_05950" "location=719777..720313"
[3,] "gene=dnrA"   "locus_tag=CD630_00010"  "location=50..1320"

answered Jul 23, 2015 at 6:53

mpalanco

13.7k3 gold badges66 silver badges71 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Rorschach · Accepted Answer · 2015-07-23 06:06:50Z

1

There are a number of methods for binding lists with unequal lengths. See bind_rows from dplyr, rbind.fill from plyr or rbindlist from data.table. Here is using base R

## Sample data
h <- list(letters[1:3],
          character(0),
          letters[4:5])

out <- do.call(rbind, lapply(h, `length<-`, 3))  # fix lengths and make matrix
out <- out[rowSums(!is.na(out))>0, ]             # remove empty rows
out[is.na(out)] <- "gen=unnamed"                 # rename NA

data.frame(out)
#   X1 X2          X3
# 1  a  b           c
# 2  d  e gen=unnamed

answered Jul 23, 2015 at 6:06

Rorschach

32.7k5 gold badges87 silver badges135 bronze badges

8 Comments

alki Over a year ago

In your answer, everything seems to be pushed to the left when you are fixing the number of columns. How would you push everything to the right if you want the NA values to be in X1?

Rorschach Over a year ago

@Chani yes, that is a problem because the lists aren't named, so it is ambiguous which column they belong to when there are missing values. To always push right try do.call(rbind, lapply(h, function(x) rev(`length<-`(x, 3))))

alki Over a year ago

I tried looking into rbindlist, as it is much faster on large lists. I'm trying rbindlist(lapply(h, function(x) rev(length<-(x, 3)))) however I keep getting an error Item 1 of list input is not a data.frame, data.table or list. When I check the class of lapply(h, function(x) rev(length<-(x, 3))) it returns list.

Rorschach Over a year ago

yea it should be fast. try rbindlist(lapply(h, function(x) as.list(rev(`length<-`(x, 3)))))

alki Over a year ago

Haha, now it completely reverses the order of the columns. It now becomes location | locus | gene instead of gene | locus | location while correctly pushing everything to the right

|

Collectives™ on Stack Overflow

R: Creating a data frame from list with missing values.

2 Answers 2

Comments

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related