1

I would like to combine multiple dataframes, as output of a function, into one big dataframe in R.

I am simulating data within a function, e.g.:

set.seed(123)

x <- function(){
return( data.frame( matrix(rnorm(10, 1, .5), ncol=2) ) )
}

I would like to run multiple simulations and tie the dataframes together.

Attempt

set.seed(123)

x_improved <- function(sim_nr){
  df <- data.frame( matrix(rnorm(10, 1, .5), ncol=2) )  # simulate data
  sim_nr <- rep(sim_nr, length(df[,1])).                # add reference number
  df <- cbind(df, sim_nr)                               # bind columns
  return(df)
}

list_dataframes <- lapply(c(1,2,3), x_improved)         # create list of dataframes

df <- do.call("rbind", list_dataframes)                 # convert list to dataframe

The code above does so, see "Expected output" below.

Expected output:

> df
          X1        X2 sim_nr
1  0.4660881 0.1566533      1
2  0.8910125 1.4188935      1
3  0.4869978 1.0766866      1
4  0.6355544 0.4309315      1
5  0.6874804 1.6269075      1
6  1.2132321 1.3443201      2
7  0.8524643 1.2769588      2
8  1.4475628 0.9690441      2
9  1.4390667 0.8470187      2
10 1.4107905 0.8097645      2
11 0.6526465 0.4384457      3
12 0.8960414 0.7985576      3
13 0.3673018 0.7666723      3
14 2.0844780 1.3899826      3
15 1.6039810 0.9583155      3

Question:

Is this the proper (or R) way to address this problem? Are there more efficient (or convenient) solutions?

0

4 Answers 4

3

Another approach would be to use an array which can be more performant if you need to do a lot of grouping operations.

set.seed(123)
replicate(3, matrix(rnorm(10, 1, 0.5), ncol = 2))
, , 1

          [,1]      [,2]
[1,] 0.7197622 1.8575325
[2,] 0.8849113 1.2304581
[3,] 1.7793542 0.3674694
[4,] 1.0352542 0.6565736
[5,] 1.0646439 0.7771690

, , 2

          [,1]       [,2]
[1,] 1.6120409 1.89345657
[2,] 1.1799069 1.24892524
[3,] 1.2003857 0.01669142
[4,] 1.0553414 1.35067795
[5,] 0.7220794 0.76360430

, , 3

          [,1]      [,2]
[1,] 0.4660881 0.1566533
[2,] 0.8910125 1.4188935
[3,] 0.4869978 1.0766866
[4,] 0.6355544 0.4309315
[5,] 0.6874804 1.6269075

Or, if you want a data.frame, it's oftentimes faster to do all of your rnorm simulations at once. Note that even with the seed set that this isn't an exact match - the matrix fills up by the column so the ordering is slightly different.

set.seed(123)
nsim <- 3
data.frame(matrix(rnorm(10 * n_sim, 1, 0.5), ncol = 2),
           sim_nr = rep(seq_len(n_sim), each = 5)
  )
Sign up to request clarification or add additional context in comments.

Comments

2

Using purrr library

purrr::map_df(c(1,2,3), ~data.frame(matrix(rnorm(10, 1, .5), ncol=2)), .id='sim_nr') 
#Using the x function it would be 
purrr::map_df(c(1,2,3), ~x() , .id='sim_nr')

Comments

2

One way to improve at least by number of lines would be to use transform and the function x_improved becomes one-liner

set.seed(123)
x_improved <- function(sim_nr){
   transform(data.frame(matrix(rnorm(10, 1,.5), ncol=2), sim_nr = sim_nr))
}

do.call(rbind, lapply(1:3, x_improved))


#          X1         X2 sim_nr
#1  0.7197622 1.85753249      1
#2  0.8849113 1.23045810      1
#3  1.7793542 0.36746938      1
#4  1.0352542 0.65657357      1
#5  1.0646439 0.77716901      1
#6  1.6120409 1.89345657      2
#7  1.1799069 1.24892524      2
#8  1.2003857 0.01669142      2
#9  1.0553414 1.35067795      2
#10 0.7220794 0.76360430      2
#11 0.4660881 0.15665334      3
#12 0.8910125 1.41889352      3
#13 0.4869978 1.07668656      3
#14 0.6355544 0.43093153      3
#15 0.6874804 1.62690746      3

Or depending on your use-case you could construct the dataframe all together.

num <- 1:3
transform(data.frame(matrix(rnorm(10 * length(num), 1,.5), ncol=2)), 
          sim_nr = rep(num, each = 10/2))

Comments

0

The simplest solution is to use rbindlist from the data.table library:

> library(data.table)
> rbindlist(list_dataframes)

You can of course do it for your list_dataframes either outside or inside of the "for" loop.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.