1

I am trying to define an empty df outside a for loop and then fill the rows/columns from inside the loop, something like this:

df<- data.frame()
    for (fl in files){
      dt <- read.table(fl, header = FALSE, col.names = c("year","month","value"),
       colClasses = c("character","character","numeric"))
      t <- aggregate(value ~ year, dt, sum)
      df$year <- t$year
      df$value <- t$value * someFunction() 
    }

Now, There are a various ways to create an empty df in R.

df <- data.frame()

# or another method
df <- data.frame(Month=character(), 
                 Value=character(), 
                 stringsAsFactors=FALSE) 

# or another method
df <- data.frame(matrix(nrow = 0, ncol = 2))

But when I assign values to the data frame, the following error is produced:

df$Month <- month.abb

Error in `$<-.data.frame`(`*tmp*`, File, value = c("Jan", "Feb", "Mar",  : 
  replacement has 12 rows, data has 0

I don't know what I am doing wrong or any misconception that I might have, but I couldn't find my way around this. Can anyone explain it to me ?

P.S: df <- data.frame(matrix(nrow = 100, ncol = 2)) works but I don't know if its a good idea because my df will have different number of rows.

4
  • Can you show the broader aspect of what you're trying to achieve? There is bound to be a more appropriate way of doing things. Commented Jun 23, 2018 at 9:21
  • @RomanLuštrik question edited Commented Jun 23, 2018 at 9:27
  • in your loop you're overwriting the df at each iteration Commented Jun 23, 2018 at 9:48
  • yes, I will deal with it later on, for now I just need to figure out a way to fill in data without getting the error as i mentioned. @Moody_Mudskipper Commented Jun 23, 2018 at 9:58

4 Answers 4

2

You need to add the values to a list in the for loop, and then you can bind the rows together as a data frame. Something like this:

myList <- list()

for (m in 1:length(month.abb)) {
  myList[[m]] <- month.abb[m]

}

df <- as.data.frame(do.call(rbind, myList))
Sign up to request clarification or add additional context in comments.

3 Comments

no! actually I need to read data from a lot of files and then operate on them.
@anup no what? This approach is easily adjusted to solve your specific problem
I really appreciate your effort but rbind approach would work if i am building rows inside the loop just like you did. But I have to read data from a table into a df, perform some calculations, and then store it to another df. Moody's answer is something more suitable for my case. Thanks :)
2

If one needs to execute the same set of calculations on a number of input files, one can accomplish this with an apply() function, avoiding the need for a for() loop.

To illustrate, we'll use the data from Alberto Barradas' Pokémon with stats database that he posted to Kaggle. The actual CSV files I used are accessible on my PokémonData github repository.

I split the data into 6 separate CSV files, one per generation of Pokémon. To make the example completely reproducible, the files are downloaded then stored in a subdirectory of the R Working Directory.

We'll read the file names with list.files() so we can process a variable number of files without having to hand edit the file names, and use the result as input into lapply(). We'll also use an anonymous function to read the data and perform additional calculations.

The output from lapply() is a list of dataframes that can be subsequently processed individually, or combined into a single data frame with do.call() as illustrated in one of the other answers.

download.file("https://raw.githubusercontent.com/lgreski/pokemonData/master/pokemonData.zip",
              "pokemonData.zip",
              method="curl",mode="wb")
unzip("pokemonData.zip")

thePokemonFiles <- list.files("./pokemonData",
                              full.names=TRUE)    
pokemonDataFiles <- lapply(thePokemonFiles,function(x) {
     y <- read.csv(x,stringsAsFactors=FALSE)
     y$speedSquared <- y$Speed^2
     y # return data frame to result object
     })
head(pokemonDataFiles[[1]])

...and the output:

> head(pokemonDataFiles[[1]])
  Number                  Name Type1  Type2 Total HP Attack Defense SpecialAtk SpecialDef Speed Generation Legendary
1      1             Bulbasaur Grass Poison   318 45     49      49         65         65    45          1     False
2      2               Ivysaur Grass Poison   405 60     62      63         80         80    60          1     False
3      3              Venusaur Grass Poison   525 80     82      83        100        100    80          1     False
4      3 VenusaurMega Venusaur Grass Poison   625 80    100     123        122        120    80          1     False
5      4            Charmander  Fire          309 39     52      43         60         50    65          1     False
6      5            Charmeleon  Fire          405 58     64      58         80         65    80          1     False
  speedSquared
1         2025
2         3600
3         6400
4         6400
5         4225
6         6400
> 

DISCLOSURE: this code is based on code I published in a blog article during 2017, Forms of the Extract Operator.

Comments

1

Here are 4 ways to grow your data.frame:

col1 <- letters[1:3] # [1] "a" "b" "c"
col2 <- letters[4:6] # [1] "d" "e" "f"

1- Start by assigning the first column

df1 <- data.frame(col1,stringsAsFactors = FALSE)
df1$col2 <- col2

2- Grow a list first, convert afterwards

l2 <- list()
l2$col1 <- col1
l2$col2 <- col2
df2 <- data.frame(l2,stringsAsFactors = FALSE)

3- Define the data.frame with columns initiated with the right length:

df3 <- data.frame(col1 = character(3), col2 = character(3))
df3$col1 <- col1
df3$col2 <- col2

4- Set rownames when you define it so it has 0 column and n rows

df4 <- data.frame(row.names = 1:3)
df4$col1 <- col1
df4$col2 <- col2

Check that it's all equivalent:

identical(df1,df2) # [1] TRUE
identical(df1,df3) # [1] TRUE
identical(df1,df4) # [1] TRUE

4 Comments

It looks like this solves it. BTW 3 and 4 sets the number of rows at the beginning, which will actually be uncertain since I have to read from file and different files have different number of rows. Say I set row numbers to 100 and the file has only 20 rows then will the df have 5 copies of the data or only 20 rows ?
I believe it might be a XY problem. Maybe you should subset t (that you shouldn't name t as it's also the transpose function ) or a copy of it rather than copying columns from t to df.
Assigning to a df with the wrong number of rows will trigger an error, it won't recycle, if it's your question
yeah, I meant if it would repeat itself. If it does not repeat then all of your 4 options would be great.
0

Does this help?

months = c("Jan","Feb","Mar")

df <- data.frame(Month=character(), 
             Value=character(), 
             stringsAsFactors=FALSE)

for (i in 1:length(months)){

    df[i,1] = months[i]
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.