how to apply function to each column in dataframe of R

Question

I want to apply a function to each column in R. Suppose following is the dataframe with (3xn):

df <- data.frame(
  h1 = c(1,2,3),
  h2 = c(2,3,1),
  h3 = c(3,2,1),
  h4 = c(1,2,3),
  h5 = c(1,2,3)
)
rownames(df) <- c("e1", "e2", "e3")
df
#    h1 h2 h3 h4 h5
# e1  1  2  3  1  1
# e2  2  3  2  2  2
# e3  3  1  1  3  3

if we want to check if the first 2 elements suppose (e1==1, e2==2) for each column (h1,h2...). How could we apply the checking function to each column in the data frame?

Please do not post an image of code/data/errors: it cannot be copied or searched (SEO), it breaks screen-readers, and it may not fit well on some mobile devices. Ref: meta.stackoverflow.com/a/285557/3358272 (and xkcd.com/2116). Please just include the code or data (e.g., dput(head(x)) or data.frame(...)) directly. — r2evans
– r2evans, Commented Oct 31, 2019 at 17:10
@r2evans When it was posted first, it was not an image though. I think it got edited — akrun
– akrun, Commented Oct 31, 2019 at 17:11
You don't have permissions yet to show an image. But if you put it in with that intent, typically somebody edits your question to actually show the image. But my point is that an image of data does me (and others) no good, and I categorically won't spend time transcribing data from an image into usable code or data. It is just as easy (perhaps easier) for you to copy text from your R console and paste into a code-block than to get a screenshot and post it in as an image. — r2evans
– r2evans, Commented Oct 31, 2019 at 17:12
In general, "apply function to each column" is literally lapply(dataframe, myfunc). akrun's suggestion to use colSums is one of the special cases, and is much more efficient in this situation. — r2evans
– r2evans, Commented Oct 31, 2019 at 17:12
For the record, after taking the sample data in alex_jwb90's answer (and changing to data.frame), this question is a bit more easily reproducible. I kept the row names solely because you referenced them as e1==1, etc; note that many operations on frames will not preserve row names, including just about everything within the tidyverse meta-package; so while I can see some utility in row names in general (and it can be a polarizing opinion for some), I normally don't use or rely on them. — r2evans
– r2evans, Commented Oct 31, 2019 at 17:47

akrun · Accepted Answer · 2019-10-31 17:17:06Z

3

Subset the rows of the data based on either row.names or the head, compare == with a vector of values, get the colSums of the logical matrix derived from it and check if that is equal to 2 i.e. if both the elements are TRUE for each column

colSums(mat[c("e1", "e2"),] == c(1, 2))==2

Or with apply, loop over the columns (MARGIN = 2), apply the function (anonymous function call) and check if all are TRUE from the comparison

apply(head(mat, 2), 2, function(x) all(x  == c(1, 2)))

edited Oct 31, 2019 at 17:17

answered Oct 31, 2019 at 17:07

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

learner01 Over a year ago

thank you for your answer. But, I have to do something like following and hence need to use apply/lapply/sapply function in R: rank.shape = function(x) { # i here ranges from 1 to 5 which is the number of columns df = NA if (x[1,][i]==2 && x[2,][i]==1){ //.... } else if(x[1,][i]==2 && x[2,][i]==1.5){ //....) } list.shape = lapply(matrixRanks[1,], rank.shape)

akrun Over a year ago

@ChimiWangmo I answered for the question posted

r2evans Over a year ago

Chimi, that much code in a comment doesn't always format well. Further, you explicitly say you need to use one of the apply family of functions, so please be clear in your question. Using "apply" as a verb does not clearly indicate using apply as a function.

r2evans · Accepted Answer · 2019-10-31 17:19:57Z

3

Using @alex_jwb90's data,

lapply(df, function(a) a[1:2] == 1:2)
# $h1
# [1] TRUE TRUE
# $h2
# [1] FALSE FALSE
# $h3
# [1] FALSE  TRUE
# $h4
# [1] TRUE TRUE
# $h5
# [1] TRUE TRUE

lapply(df, function(a) all(a[1:2] == 1:2))
# $h1
# [1] TRUE
# $h2
# [1] FALSE
# $h3
# [1] FALSE
# $h4
# [1] TRUE
# $h5
# [1] TRUE

sapply(df, function(a) all(a[1:2] == 1:2))
#    h1    h2    h3    h4    h5 
#  TRUE FALSE FALSE  TRUE  TRUE

answered Oct 31, 2019 at 17:19

r2evans

167k8 gold badges92 silver badges176 bronze badges

Comments

alex_jwb90 · Accepted Answer · 2019-10-31 17:25:26Z

0

You can try this (extensible to check more than two rows if you remove the & row_number() <= 2)

library(dplyr)

df = tibble(
  h1 = c(1,2,3),
  h2 = c(2,3,1),
  h3 = c(3,2,1),
  h4 = c(1,2,3),
  h5 = c(1,2,3)
)

df %>%
  mutate_all(
    list(equals_rownum = ~.==row_number() & row_number() <= 2)
  )

If you don't want to create new columns <col>_equals_rownum but replace h1,h2,...-columns, just remove the name in the list-call.

edited Oct 31, 2019 at 17:25

answered Oct 31, 2019 at 17:17

alex_jwb90

1,7331 gold badge11 silver badges20 bronze badges

Collectives™ on Stack Overflow

how to apply function to each column in dataframe of R

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related