10

I am looking for a way to automate some diagrams in R using a FOR loop:

dflist <- c("dataframe1", "dataframe2", "dataframe3", "dataframe4")

for (i in dflist) {
  plot(i$var1, i$var2)
}

All dataframes have the same variables, i.e. var1, var2.

It seems for loops are not the most elegant solution here, but I don't understand how to use the apply functions for diagrams.

EDIT:

My original example using mean() didn't help in the original question, so I changed it to a plot function.

3
  • 2
    Using a for loop is fine. Just put the actual data.frames in a list and not just their names in a vector. To be more readable you could also change the loop content to plot(var2~var1, data=i). However, you might want to save the plots (read ?pdf) or put several plots on one graph page (read ?par). Commented May 23, 2013 at 13:52
  • Although I agree with Roland that for loops are fine, this example with a list of data.frame is a really good fit for lapply. Commented May 24, 2013 at 7:07
  • @arumbay I would also check out facetting in the ggplot2 package to create groups of plots. Commented May 24, 2013 at 7:10

5 Answers 5

16

To further add to Beasterfield's answer, it seems like you want to do some number of complex operations on each of the data frames.

It is possible to have complex functions within an apply statement. So where you now have:

for (i in dflist) {
  # Do some complex things
}

This can be translated to:

lapply(dflist, function(df) {
  # Do some complex operations on each data frame, df
  # More steps

  # Make sure the last thing is NULL. The last statement within the function will be
  # returned to lapply, which will try to combine these as a list across all data frames.
  # You don't actually care about this, you just want to run the function.
  NULL
})

A more concrete example using plot:

# Assuming we have a data frame with our points on the x, and y axes,
lapply(dflist, function(df) {
  x2 <- df$x^2
  log_y <- log(df$y)
  plot(x,y)
  NULL
})

You can also write complex functions which take multiple arguments:

lapply(dflist, function(df, arg1, arg2) {
  # Do something on each data.frame, df
  # arg1 == 1, arg2 == 2 (see next line)
}, 1, 2) # extra arguments are passed in here

Hope this helps you out!

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you, that was very helpful and helped me to better understand the principle behind the apply functions!
6

Concerning your actual question you should learn how to access cells, rows and columns of data.frames, matrixs or lists. From your code I guess you want to access the j'th columns of the data.frame i, so it should read:

mean( i[,j] )
# or
mean( i[[ j ]] )

The $ operator can be only used if you want to access a particular variable in your data.frame, e.g. i$var1. Additionally, it is less performant than accessing by [, ] or [[]].

However, although it's not wrong, usage of for loops it is not very R'ish. You should read about vectorized functions and the apply family. So your code could be easily rewritten as:

set.seed(42)
dflist <- vector( "list", 5 )
for( i in 1:5 ){
  dflist[[i]] <- data.frame( A = rnorm(100), B = rnorm(100), C = rnorm(100) )
}
varlist <- c("A", "B")

lapply( dflist, function(x){ colMeans(x[varlist]) } )

1 Comment

Thanks - I feared my mean() example would be too simple. I'm looking for a way to automatically generate scatterplots referring to a set of dataframes (see changes in the example above); I guess this is also possible using apply functions?
2
set.seed(42)
dflist <- list(data.frame(x=runif(10),y=rnorm(10)),
               data.frame(x=rnorm(10),y=runif(10)))

par(mfrow=c(1,2))
for (i in dflist) {
  plot(y~x, data=i)
}

Comments

2

Using the example of @Roland, I wanted to show you the ggplot2 equivalent. First we have to change the datset a bit:

First the original data:

> dflist
[[1]]
           x           y
1  0.9148060 -0.10612452
2  0.9370754  1.51152200
3  0.2861395 -0.09465904
4  0.8304476  2.01842371
5  0.6417455 -0.06271410
6  0.5190959  1.30486965
7  0.7365883  2.28664539
8  0.1346666 -1.38886070
9  0.6569923 -0.27878877
10 0.7050648 -0.13332134

[[2]]
            x          y
1   0.6359504 0.33342721
2  -0.2842529 0.34674825
3  -2.6564554 0.39848541
4  -2.4404669 0.78469278
5   1.3201133 0.03893649
6  -0.3066386 0.74879539
7  -1.7813084 0.67727683
8  -0.1719174 0.17126433
9   1.2146747 0.26108796
10  1.8951935 0.51441293

and put the data into one data.frame, with an id column

require(reshape2)
one_df = melt(dflist, id.vars = c("x","y"))
> one_df
            x           y L1
1   0.9148060 -0.10612452  1
2   0.9370754  1.51152200  1
3   0.2861395 -0.09465904  1
4   0.8304476  2.01842371  1
5   0.6417455 -0.06271410  1
6   0.5190959  1.30486965  1
7   0.7365883  2.28664539  1
8   0.1346666 -1.38886070  1
9   0.6569923 -0.27878877  1
10  0.7050648 -0.13332134  1
11  0.6359504  0.33342721  2
12 -0.2842529  0.34674825  2
13 -2.6564554  0.39848541  2
14 -2.4404669  0.78469278  2
15  1.3201133  0.03893649  2
16 -0.3066386  0.74879539  2
17 -1.7813084  0.67727683  2
18 -0.1719174  0.17126433  2
19  1.2146747  0.26108796  2
20  1.8951935  0.51441293  2

and make the plot:

require(ggplot2)
ggplot(one_df, aes(x = x, y = y)) + geom_point() + facet_wrap(~ L1)

enter image description here

Comments

0

Based on Scott Ritchi solution, this would the reproducible example, hiding also the feedback message from lapply:

# split dataframe by condition on cars hp
f <- function() trunc(signif(mtcars$hp, 2) / 100)
dflist <- lapply(unique(f()), function(x) subset(mtcars, f() == x ))

This splits the mtcars dataframe is subsets based on the hp variable classification (0 for hp lower than 100, 1 for those in the 100's, 2 for 200's, and so on.)

And, plot it:

# use invisible to prevent the feedback message from lapply
invisible(
    lapply(dflist, function(df) {
    x2 <- df$mpg^2
    log_y <- log(df$hp)
    plot(x2, log_y)
    NULL
}))

invisible() will prevent the lapply() message:

16 
9 
6 
1 
[[1]]
NULL

[[2]]
NULL

[[3]]
NULL

[[4]]
NULL

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.