use function on multiple columns (variables) in r

Question

I am trying to run tests of homogeneity of variance using the leveneTest function from the car package. I can run the test on a single variable like so (using the iris dataset as an example)

library(car)
library(datasets)

data(iris)

leveneTest(iris$Sepal.Length, iris$Species)

However, I would like to run the test on all the dependent variables in the dataset simultaneously (so Sepal.Length, Sepal.Width, Petal.Length, Petal.Width). I am guessing it has something to do with the apply family of functions (sapply, lapply, tapply) but I just can't figure out how. The closest I came is something like this:

lapply(iris, leveneTest(group = iris$Species))

However I get the error

Error in leveneTest.default(group = iris$Species) : 
  argument "y" is missing, with no default

Which I understand is probably because it isn't able to specify the outcome variables. I am certain I must be missing some obvious use of the apply functions, but I just don't understand what it is. Apologies for the basic question, but I am relatively new to R and am often applying the same function to multiple variables (usually by copying the code several times), so it would be great to understand how to use these functions properly :)

Roland · Accepted Answer · 2020-05-27 13:07:37Z

6

Common parameters to the function need to be passed to ... within lapply. Like this:

lapply(subset(iris, select = -Species), leveneTest, group = iris$Species)

help("lapply") explains that ... is for "optional arguments to FUN" (meaning optional for lapply not for FUN) and provides lapply(x, quantile, probs = 1:3/4) as an example.

edited May 27, 2020 at 13:07

answered May 27, 2020 at 13:01

Roland

134k12 gold badges203 silver badges305 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Chris R · Accepted Answer · 2020-05-27 14:16:26Z

2

Piggybacking on @Roland's answer, you can do the following in base R as well:

lapply(iris[,-5], leveneTest, group = iris$Species

the -5 is obviously specific to the iris dataset. You could replace it with a variable like

lapply(iris[,-length(iris)]....

and that would let you remove the last element of the df, assuming your grouping variable is last.

Additionally as a data.table fanboy, I'll add an option for you to use that as well, if you're interested.

dt.iris[, lapply(.SD, leveneTest, group = Species), .SDcols = !'Species']

this code enables you to 'remove' the Species column from your lapply function in a similar manner to the above base R examples, but by naming it explicitly via the .SD and .SDcols variables. Then you run your analysis in a fairly straightforward manner. Hope this helps!

edited May 27, 2020 at 14:16

answered May 27, 2020 at 13:35

Chris R

788 bronze badges

2 Comments

Roland Over a year ago

I just wish to say that my answer does not use dplyr. I'm not sure if you are implying that.

Chris R Over a year ago

My mistake, I thought you had included a dplyr function in your answer, but clearly I need my prescription checked. Updated my post accordingly, sorry about that

Collectives™ on Stack Overflow

use function on multiple columns (variables) in r

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related