Looping over specific columns of dataframes in a list in R

Question

I want to loop through different datasets in a list, using lapply, and in every item of the list through the columns, but only those that are saved in a vector called vector_test. These variables I like to center, so basically subtract the weighted mean of every variable that is looped through in every dataset.

Let's assume I have the following 3 datasets saved in a list:

df1<-data.frame(v1=c(1,2,3,4,5,6,7),
                v2=c(9,8,7,6,5,4,3),
                v3=c(4,5,6,7,4,4,3),
                v4=c(5,6,4,5,6,5,6))

df2<-data.frame(v1=c(1,5,3,4,9,6,7),
                diff_var=c(1,3,4,6,2,3,4),
                v2=c(9,8,2,6,3,4,3),
                v3=c(4,5,6,7,3,4,3),
                v4=c(5,2,4,4,6,1,6))

df3<-data.frame(v1=c(1,5,8,4,2,6,1),
                v2=c(1,8,1,6,2,4,7),
                v3=c(1,5,2,5,3,4,3),
                v4=c(5,9,4,5,6,2,6))

test_liste<-list(df1,df2,df3)

Further, I have names of variables saved in a vector:

vector_test<-c("v3","v4")

Tried a for loop/sapply embedded in lapply but cannot seem to figure out a way of only picking the variables that have identical names from the vector compared to the datasets.

If any clarfication is needed or additional code, please let me know!

Thanks in advance!

stefan · Accepted Answer · 2023-01-25 10:03:16Z

2

Using lapply you could do:

lapply(test_liste, function(x) {
  x[vector_test] <- lapply(x[vector_test], function(x) x - mean(x))
  x
})
#> [[1]]
#>   v1 v2         v3         v4
#> 1  1  9 -0.7142857 -0.2857143
#> 2  2  8  0.2857143  0.7142857
#> 3  3  7  1.2857143 -1.2857143
#> 4  4  6  2.2857143 -0.2857143
#> 5  5  5 -0.7142857  0.7142857
#> 6  6  4 -0.7142857 -0.2857143
#> 7  7  3 -1.7142857  0.7142857
#> 
#> [[2]]
#>   v1 diff_var v2         v3 v4
#> 1  1        1  9 -0.5714286  1
#> 2  5        3  8  0.4285714 -2
#> 3  3        4  2  1.4285714  0
#> 4  4        6  6  2.4285714  0
#> 5  9        2  3 -1.5714286  2
#> 6  6        3  4 -0.5714286 -3
#> 7  7        4  3 -1.5714286  2
#> 
#> [[3]]
#>   v1 v2         v3         v4
#> 1  1  1 -2.2857143 -0.2857143
#> 2  5  8  1.7142857  3.7142857
#> 3  8  1 -1.2857143 -1.2857143
#> 4  4  6  1.7142857 -0.2857143
#> 5  2  2 -0.2857143  0.7142857
#> 6  6  4  0.7142857 -3.2857143
#> 7  1  7 -0.2857143  0.7142857

answered Jan 25, 2023 at 10:03

stefan

130k6 gold badges42 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Christioh Over a year ago

Perfect, thank you! Just one addition: adapted code a little bit to incoporate weighted mean from stats package. Code is following: ` test_liste_neu<-lapply(test_liste, function(y) { y[vector_test] <- lapply(y[vector_test], function(x) x - weighted.mean(x,y$v1,na.rm=T)) y }) `

Collectives™ on Stack Overflow

Looping over specific columns of dataframes in a list in R

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related