working with empty data frames in R

Question

I am trying to define an empty df outside a for loop and then fill the rows/columns from inside the loop, something like this:

df<- data.frame()
    for (fl in files){
      dt <- read.table(fl, header = FALSE, col.names = c("year","month","value"),
       colClasses = c("character","character","numeric"))
      t <- aggregate(value ~ year, dt, sum)
      df$year <- t$year
      df$value <- t$value * someFunction() 
    }

Now, There are a various ways to create an empty df in R.

df <- data.frame()

# or another method
df <- data.frame(Month=character(), 
                 Value=character(), 
                 stringsAsFactors=FALSE) 

# or another method
df <- data.frame(matrix(nrow = 0, ncol = 2))

But when I assign values to the data frame, the following error is produced:

df$Month <- month.abb

Error in `$<-.data.frame`(`*tmp*`, File, value = c("Jan", "Feb", "Mar",  : 
  replacement has 12 rows, data has 0

I don't know what I am doing wrong or any misconception that I might have, but I couldn't find my way around this. Can anyone explain it to me ?

P.S: df <- data.frame(matrix(nrow = 100, ncol = 2)) works but I don't know if its a good idea because my df will have different number of rows.

Can you show the broader aspect of what you're trying to achieve? There is bound to be a more appropriate way of doing things. — Roman Luštrik
– Roman Luštrik, Commented Jun 23, 2018 at 9:21
yes, I will deal with it later on, for now I just need to figure out a way to fill in data without getting the error as i mentioned. @Moody_Mudskipper — anup
– anup, Commented Jun 23, 2018 at 9:58

fugu · Accepted Answer · 2018-06-23 09:44:06Z

2

You need to add the values to a list in the for loop, and then you can bind the rows together as a data frame. Something like this:

myList <- list()

for (m in 1:length(month.abb)) {
  myList[[m]] <- month.abb[m]

}

df <- as.data.frame(do.call(rbind, myList))

answered Jun 23, 2018 at 9:44

fugu

6,6129 gold badges42 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

anup Over a year ago

no! actually I need to read data from a lot of files and then operate on them.

fugu Over a year ago

@anup no what? This approach is easily adjusted to solve your specific problem

anup Over a year ago

I really appreciate your effort but rbind approach would work if i am building rows inside the loop just like you did. But I have to read data from a table into a df, perform some calculations, and then store it to another df. Moody's answer is something more suitable for my case. Thanks :)

Len Greski · Accepted Answer · 2018-06-23 21:31:37Z

If one needs to execute the same set of calculations on a number of input files, one can accomplish this with an apply() function, avoiding the need for a for() loop.

To illustrate, we'll use the data from Alberto Barradas' Pokémon with stats database that he posted to Kaggle. The actual CSV files I used are accessible on my PokémonData github repository.

I split the data into 6 separate CSV files, one per generation of Pokémon. To make the example completely reproducible, the files are downloaded then stored in a subdirectory of the R Working Directory.

We'll read the file names with list.files() so we can process a variable number of files without having to hand edit the file names, and use the result as input into lapply(). We'll also use an anonymous function to read the data and perform additional calculations.

The output from lapply() is a list of dataframes that can be subsequently processed individually, or combined into a single data frame with do.call() as illustrated in one of the other answers.

download.file("https://raw.githubusercontent.com/lgreski/pokemonData/master/pokemonData.zip",
              "pokemonData.zip",
              method="curl",mode="wb")
unzip("pokemonData.zip")

thePokemonFiles <- list.files("./pokemonData",
                              full.names=TRUE)    
pokemonDataFiles <- lapply(thePokemonFiles,function(x) {
     y <- read.csv(x,stringsAsFactors=FALSE)
     y$speedSquared <- y$Speed^2
     y # return data frame to result object
     })
head(pokemonDataFiles[[1]])

...and the output:

> head(pokemonDataFiles[[1]])
  Number                  Name Type1  Type2 Total HP Attack Defense SpecialAtk SpecialDef Speed Generation Legendary
1      1             Bulbasaur Grass Poison   318 45     49      49         65         65    45          1     False
2      2               Ivysaur Grass Poison   405 60     62      63         80         80    60          1     False
3      3              Venusaur Grass Poison   525 80     82      83        100        100    80          1     False
4      3 VenusaurMega Venusaur Grass Poison   625 80    100     123        122        120    80          1     False
5      4            Charmander  Fire          309 39     52      43         60         50    65          1     False
6      5            Charmeleon  Fire          405 58     64      58         80         65    80          1     False
  speedSquared
1         2025
2         3600
3         6400
4         6400
5         4225
6         6400
>

DISCLOSURE: this code is based on code I published in a blog article during 2017, Forms of the Extract Operator.

moodymudskipper · Accepted Answer · 2018-06-23 10:17:22Z

1

Here are 4 ways to grow your data.frame:

col1 <- letters[1:3] # [1] "a" "b" "c"
col2 <- letters[4:6] # [1] "d" "e" "f"

1- Start by assigning the first column

df1 <- data.frame(col1,stringsAsFactors = FALSE)
df1$col2 <- col2

2- Grow a list first, convert afterwards

l2 <- list()
l2$col1 <- col1
l2$col2 <- col2
df2 <- data.frame(l2,stringsAsFactors = FALSE)

3- Define the data.frame with columns initiated with the right length:

df3 <- data.frame(col1 = character(3), col2 = character(3))
df3$col1 <- col1
df3$col2 <- col2

4- Set rownames when you define it so it has 0 column and n rows

df4 <- data.frame(row.names = 1:3)
df4$col1 <- col1
df4$col2 <- col2

Check that it's all equivalent:

identical(df1,df2) # [1] TRUE
identical(df1,df3) # [1] TRUE
identical(df1,df4) # [1] TRUE

edited Jun 23, 2018 at 10:17

answered Jun 23, 2018 at 10:10

moodymudskipper

47.7k12 gold badges131 silver badges185 bronze badges

4 Comments

anup Over a year ago

It looks like this solves it. BTW 3 and 4 sets the number of rows at the beginning, which will actually be uncertain since I have to read from file and different files have different number of rows. Say I set row numbers to 100 and the file has only 20 rows then will the df have 5 copies of the data or only 20 rows ?

moodymudskipper Over a year ago

I believe it might be a XY problem. Maybe you should subset t (that you shouldn't name t as it's also the transpose function ) or a copy of it rather than copying columns from t to df.

moodymudskipper Over a year ago

Assigning to a df with the wrong number of rows will trigger an error, it won't recycle, if it's your question

anup Over a year ago

yeah, I meant if it would repeat itself. If it does not repeat then all of your 4 options would be great.

Moh · Accepted Answer · 2018-06-23 10:01:30Z

0

Does this help?

months = c("Jan","Feb","Mar")

df <- data.frame(Month=character(), 
             Value=character(), 
             stringsAsFactors=FALSE)

for (i in 1:length(months)){

    df[i,1] = months[i]
}

answered Jun 23, 2018 at 10:01

Moh

1988 bronze badges

Collectives™ on Stack Overflow

working with empty data frames in R

4 Answers 4

3 Comments

Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related