0

I'm trying to explore a large dataset, both with data frames and with charts. I'd like to analyze the distribution of each variable by different metrics (e.g., sum(x), sum(x*y)) and for different sub-populations. I have 4 sub-populations, 2 metrics, and many variables.

In order to accomplish that, I've made a list structure such as this:

$variable1
...$metric1     <--- that's a df.
...$metric2
$variable2
...$metric1
...$metric2

Inside one of the data_frames (e.g., list$variable1$metric1), I've calculated distributions of the unique values for variable1 and for each of the four population groups (represented in columns). It looks like this:

$variable1$metric1
unique_values med_all med_some_not_all med_at_least_some med_none
1 (1) 12-17 Years Old      NA               NA                NA       NA
2 (2) 18-25 Years Old   0.278            0.317             0.278    0.317
3 (3) 26-34 Years Old   0.225            0.228             0.225    0.228
4     (4) 35 or Older   0.497            0.456             0.497    0.456


$variable1$metric2
        unique_values med_all med_some_not_all med_at_least_some med_none
1 (1) 12-17 Years Old      NA               NA                NA       NA
2 (2) 18-25 Years Old   0.544            0.406             0.544    0.406
3 (3) 26-34 Years Old   0.197            0.310             0.197    0.310
4     (4) 35 or Older   0.259            0.284             0.259    0.284

What I'm trying to figure out is a good way to loop through the list of lists (probably melting the DFs in the process) and then output a ton of bar charts. In this case, the natural plot format would be, for each dataframe, a stacked bar chart with one stacked bar for each sub-population, grouping by the variable's unique values.

But I'm not familiar with iterated plotting and so I've hit a dead end. How might I plot from that list structure? Alternately, is there a better structure in which i should be storing this information?

2
  • You need to have a good handle on list subsetting to work with list of lists. depending on the structure of your dfs, it may be helpful to combine them. Commented Jun 14, 2015 at 19:34
  • If you want to make small multiples plots, I recommend the compactr package Commented Jun 14, 2015 at 19:53

3 Answers 3

2

I find nested lists to be pretty tricky to work with, so I would combine them all into a single data frame that labels the name of the variable and the name of the metric:

lst <- list(alpha= list(a= data.frame(matrix(1:4, 2)), b= data.frame(matrix(6:9, 2))), beta = list(c = data.frame(matrix(11:14, 2))))
level1 <- lapply(lst, function(x) do.call(rbind, lapply(names(x), function(y) {x[[y]]$metric=y ; x[[y]]})))
dat <- do.call(rbind, lapply(names(level1), function(x) {level1[[x]]$variable=x ; level1[[x]]}))
dat
#   X1 X2 metric variable
# 1  1  3      a    alpha
# 2  2  4      a    alpha
# 3  6  8      b    alpha
# 4  7  9      b    alpha
# 5 11 13      c     beta
# 6 12 14      c     beta

Now you can use standard tools for manipulating a single data frame to perform your data analysis.

Sign up to request clarification or add additional context in comments.

Comments

1

here's a start:

lst <- list(alpha= list(a= data.frame(matrix(1:4, 2)), b= data.frame(matrix(6:11, 2))), 
                          beta = list(c = data.frame(matrix(11:14, 2))))

lst
$alpha
$alpha$a
  X1 X2
1  1  3
2  2  4

$alpha$b
  X1 X2 X3
1  6  8 10
2  7  9 11


$beta
$beta$c
  X1 X2
1 11 13
2 12 14

#We can subset by number or by name
lst[['alpha']]
$a
  X1 X2
1  1  3
2  2  4

$b
  X1 X2 X3
1  6  8 10
2  7  9 11

lst[[1]]
$a
  X1 X2
1  1  3
2  2  4

$b
  X1 X2 X3
1  6  8 10
2  7  9 11

#The dollar sign naming convention reminds us that we are looking at a list.
#Let's sum the columns of both data frames in the alpha list
lapply(lst[['alpha']], colSums)
$a
X1 X2 
 3  7 

$b
X1 X2 X3 
13 17 21 

Let's try to find the sum of each column of each data frame:

lapply(lst, colSums)
Error in FUN(X[[i]], ...) : 
  'x' must be an array of at least two dimensions

What happened? R is correctly refusing to run an array function on a list. The function colSums needs to be fed data frames, matrices, and other arrays above one-dimension. We have to nest an lapply function inside of another one. The logic can get complicated:

lapply(lst, function(x) lapply(x, colSums))
$alpha
$alpha$a
X1 X2 
 3  7 

$alpha$b
X1 X2 X3 
13 17 21 


$beta
$beta$c
X1 X2 
23 27 

We can use rbind to put data.frames together:

rbind(lst$alpha$a, lst$beta$c)
  X1 X2
1  1  3
2  2  4
3 11 13
4 12 14

Be sure not to do it the way you might be thinking (I've done it many times):

do.call(rbind, lst)
      a      b     
alpha List,2 List,3
beta  List,2 List,2

That isn't the result you're looking for. And make sure that the dimensions and column names are the same:

do.call(rbind, lst[[1]])
Error in rbind(deparse.level, ...) : 
  numbers of columns of arguments do not match

R is refusing to combine data frames that have 2 columns in one (alpha$a) and three columns in the other (alpha$b).

I changed the lst to make alpha$b have two columns like the others and combined them:

bind1 <- lapply(lst2, function(x) do.call(rbind, x))
bind1
$alpha
    X1 X2
a.1  1  3
a.2  2  4
b.1  6  9
b.2  7 10
b.3  8 11

$beta
    X1 X2
c.1 11 13
c.2 12 14

That combines the elements of each list. Now I can combine the outer list to make one big data frame.

do.call(rbind, bind1)
          X1 X2
alpha.a.1  1  3
alpha.a.2  2  4
alpha.b.1  6  9
alpha.b.2  7 10
alpha.b.3  8 11
beta.c.1  11 13
beta.c.2  12 14

Comments

1

Here's a strategy based on melting a list (recursively),

lst = list(alpha= list(a= data.frame(matrix(1:4, 2)), 
                       b= data.frame(matrix(6:11, 2))), 
           beta = list(c = data.frame(matrix(11:14, 2))))

library(reshape2)
m = melt(lst, id=1:2)
library(ggplot2)
ggplot(m, aes(X1,X2)) + geom_bar(stat="identity") + facet_grid(L1~L2)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.