Populating a data frame in R in a loop

Question

I am trying to populate a data frame from within a for loop in R. The names of the columns are generated dynamically within the loop and the value of some of the loop variables is used as the values while populating the data frame. For instance the name of the current column could be some variable name as a string in the loop, and the column can take the value of the current iterator as its value in the data frame.

I tried to create an empty data frame outside the loop, like this

d = data.frame()

But I cant really do anything with it, the moment I try to populate it, I run into an error

 d[1] = c(1,2)
Error in `[<-.data.frame`(`*tmp*`, 1, value = c(1, 2)) : 
  replacement has 2 rows, data has 0

What may be a good way to achieve what I am looking to do. Please let me know if I wasnt clear.

Populate a list instead of a data.frame and turn it into a data.frame after the loop. — Roland
– Roland, Commented Nov 18, 2012 at 17:16
Thanks Roland, I am a n00b, can you please elaborate more? How to declare the list, and how to convert it? — ganesh reddy
– ganesh reddy, Commented Nov 18, 2012 at 17:20

Roland · Accepted Answer · 2012-11-18 17:51:27Z

61

It is often preferable to avoid loops and use vectorized functions. If that is not possible there are two approaches:

Preallocate your data.frame. This is not recommended because indexing is slow for data.frames.
Use another data structure in the loop and transform into a data.frame afterwards. A list is very useful here.

Example to illustrate the general approach:

mylist <- list() #create an empty list

for (i in 1:5) {
  vec <- numeric(5) #preallocate a numeric vector
  for (j in 1:5) { #fill the vector
    vec[j] <- i^j 
  }
  mylist[[i]] <- vec #put all vectors in the list
}
df <- do.call("rbind",mylist) #combine all vectors into a matrix

In this example it is not necessary to use a list, you could preallocate a matrix. However, if you do not know how many iterations your loop will need, you should use a list.

Finally here is a vectorized alternative to the example loop:

outer(1:5,1:5,function(i,j) i^j)

As you see it's simpler and also more efficient.

edited Nov 18, 2012 at 17:51

answered Nov 18, 2012 at 17:32

Roland

134k12 gold badges203 silver badges305 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

thelatemail Over a year ago

You can simplify your vectorised version even more like: outer(1:5,1:5,"^")

Stackuser Over a year ago

Hi Roland, if we don't know anything about the number of iteration and the number of columns (using the example above that you have given), how do we do for loop or function, please I want your explanation since I like your example. I would appreciate if you can give like the above example so that I can customize it for big data.

Roland Over a year ago

@Stackuser This approach is suitable for medium sized data. For big data you should use something more efficient. Not knowing the number of columns is probably the result of a design flaw.

Stackuser Over a year ago

Thanks@Ronald! Can you recommend me reference because I have 37-time series data, each with 30 years of data (1110 in total) and I don't know the number of iteration and I am building a model and getting prediction errors and extracting all coefficients, p-values, prediction erros (observed - predicted value), etc? I also want to know how can we subset at any section of data e.g, if I want to make a loop (function) in every 21th or 19th value of this time series data and making prediciton and store the results somewhere? thanks!

Roland Over a year ago

That's not big data. And sorry, but don't do free consulting.

|

Seb · Accepted Answer · 2012-11-18 17:30:57Z

53

You could do it like this:

 iterations = 10
 variables = 2

 output <- matrix(ncol=variables, nrow=iterations)

 for(i in 1:iterations){
  output[i,] <- runif(2)

 }

 output

and then turn it into a data.frame

 output <- data.frame(output)
 class(output)

what this does:

create a matrix with rows and columns according to the expected growth
insert 2 random numbers into the matrix
convert this into a dataframe after the loop has finished.

answered Nov 18, 2012 at 17:30

Seb

5,5377 gold badges36 silver badges50 bronze badges

Comments

Chidi · Accepted Answer · 2018-10-11 10:53:25Z

22

this works too.

df = NULL
for (k in 1:10)
    {
       x = 1
       y = 2
       z = 3
       df = rbind(df, data.frame(x,y,z))
     }

output will look like this

df #enter

x y z #col names
1 2 3

edited Oct 11, 2018 at 10:53

answered Oct 10, 2018 at 19:58

Chidi

99111 silver badges15 bronze badges

1 Comment

Danny Over a year ago

This works great for small cases and is the simplest in my opinion, but be aware that it becomes prohibitively slow and resource intensive for large or even medium datasets because it is actually rebuilding the entire data.frame every time you add a row.

Bruno Gomes · Accepted Answer · 2019-07-31 20:18:47Z

1

Thanks Notable1, works for me with the tidytextr Create a dataframe with the name of files in one column and content in other.

    diretorio <- "D:/base"
    arquivos <- list.files(diretorio, pattern = "*.PDF")
    quantidade <- length(arquivos)

#
df = NULL
for (k in 1:quantidade) {

      nome = arquivos[k]
      print(nome)
      Sys.sleep(1) 
      dados = read_pdf(arquivos[k],ocr = T)
      print(dados)
      Sys.sleep(1)
      df = rbind(df, data.frame(nome,dados))
      Sys.sleep(1)
}
Encoding(df$text) <- "UTF-8"

answered Jul 31, 2019 at 20:18

Bruno Gomes

878 bronze badges

Comments

scs76 · Accepted Answer · 2017-08-24 17:08:27Z

I had a case in where I was needing to use a data frame within a for loop function. In this case, it was the "efficient", however, keep in mind that the database was small and the iterations in the loop were very simple. But maybe the code could be useful for some one with similar conditions.

The for loop purpose was to use the raster extract function along five locations (i.e. 5 Tokio, New York, Sau Paulo, Seul & Mexico city) and each location had their respective raster grids. I had a spatial point database with more than 1000 observations allocated within the 5 different locations and I was needing to extract information from 10 different raster grids (two grids per location). Also, for the subsequent analysis, I was not only needing the raster values but also the unique ID for each observations.

After preparing the spatial data, which included the following tasks:

Import points shapefile with the readOGR function (rgdap package)
Import raster files with the raster function (raster package)
Stack grids from the same location into one file, with the function stack (raster package)

Here the for loop code with the use of a data frame:

1. Add stacked rasters per location into a list

raslist <- list(LOC1,LOC2,LOC3,LOC4,LOC5)

2. Create an empty dataframe, this will be the output file

TB <- data.frame(VAR1=double(),VAR2=double(),ID=character())

3. Set up for loop function

L1 <- seq(1,5,1) # the location ID is a numeric variable with values from 1 to 5 

for (i in 1:length(L1)) {
  dat=subset(points,LOCATION==i) # select corresponding points for location [i] 
  t=data.frame(extract(raslist[[i]],dat),dat$ID) # run extract function with points & raster stack for location [i]
  names(t)=c("VAR1","VAR2","ID") 
  TB=rbind(TB,t)
}

symkly · Accepted Answer · 2021-11-14 09:58:13Z

0

was looking for the same and the following may be useful as well.

a <- vector("list", 1)
for(i in 1:3){a[[i]] <- data.frame(x= rnorm(2), y= runif(2))}
a
rbind(a[[1]], a[[2]], a[[3]])

answered Nov 14, 2021 at 9:58

symkly

3,03117 silver badges40 bronze badges

Collectives™ on Stack Overflow

Populating a data frame in R in a loop

6 Answers 6

6 Comments

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

6 Comments

Comments

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related