1

I have 3 data frames that I'd like to run the same data.table function on. I could do this manually for each data.frame but I'd like to learn how to do it more efficiently.

Using the data.table package, I want to replace the contents of col1 with the contents of col2 only if col1 contains "a". And I want to run this code over three different dataframes. On a single data.frame, this works fine:

df1 <- data.frame(col1 = c("a", "a", "b"), col2 = c("AA", "AA", "AA"))
library(data.table)
dt = data.table(df1)
dt[grepl(pattern = "a", x = df1$col1),  col1 :=col2]

but I am lost trying to get this to run over multiple dataframes:

df1 <- data.frame(col1 = c("a", "a", "b"), col2 = c("AA", "AA", "AA"))
df2 <- data.frame(col1 = c("b", "b", "a"), col2 = c("AA", "BB", "BB"))
df3 <- data.frame(col1 = c("b", "b", "b"), col2 = c("AA", "AA", "BB"))

library(data.table)
listdfs = list(df1, df2, df3)
for (i in dt[[]]) {
dt[[i]][grepl(pattern = "a", x = df[[i]]$col1), col1 := col2] }

But this obviously doesn't work because I have no clue what I'm doing with the for loop. Any guidance/teaching would be appreciated. Thanks!

1 Answer 1

2

If we are looping through the list, then loop over the sequence of list and then do the assignment

listdfs = list(df1, df2, df3)
lapply(listdfs, setDT) # change the `data.frame` to `data.table`
for (i in seq_along(listdfs)) { # loop over sequence
   listdfs[[i]][grepl(pattern = "a", x = col1), col1 := col2]
 }

This would change the elements i.e. data.table with in the listdfs as well the object 'df1', 'df2', 'df3' itself as we didn't create any copy

df1
#   col1 col2
#1:   AA   AA  # change
#2:   AA   AA  # change
#3:    b   AA

df2
#   col1 col2
#1:    b   AA
#2:    b   BB
#3:   BB   BB   # change

df3
#   col1 col2
#1:    b   AA
#2:    b   AA
#3:    b   BB
Sign up to request clarification or add additional context in comments.

3 Comments

This is great. Thank you. is "seq_along()" standard for running through lists in a loop?
@moxed seq_along can be for vector/data.frame/data.table column, list elements etc
@moxed The standard for running data.tables/data.frames through a loop, however, is to not run them through a loop. You can use rbind or rbindlist to get a single table, explained under the "combining" section of Gregor's answer here stackoverflow.com/a/24376207

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.