2

I have a data.frame like

  a b c d
1 1 0 0 1
2 1 1 0 0
3 0 1 0 0
4 1 0 1 0
5 1 0 0 0

Which I generated using

df<- data.frame(a=sample(0:1,5,replace=T),b=sample(0:1,5,replace=T),c=sample(0:1,5,replace=T),d=sample(0:1,5,replace=T))

How can I get the result as 4, 2, 2, 3, 1 if I pass 1 to that function depicting to find the last index of 1 in each row.

3
  • yes @akrun. i was using codebunk which was not allowing me to copy. Commented Jul 30, 2015 at 12:18
  • Suppose you have a row with only 0's what will be the index for that row? Commented Jul 30, 2015 at 12:20
  • @akrun -1 should be fine. or maybe 0. Commented Jul 30, 2015 at 12:22

4 Answers 4

4

One approach would be:

apply(df, 1, function(x) max(which(x == 1)))

If you wanted to be flexible about which element you're checking for and handle cases where the value is missing from a row:

max.row <- function(df, val) unname(apply(df, 1, function(x) tail(c(NA, which(x == val)), 1)))
max.row(df, 0)
# [1] 3 4 4 4
max.row(df, 1)
# [1] 4 2 2 3
max.row(df, 2)
# [1] NA NA NA NA
Sign up to request clarification or add additional context in comments.

Comments

4

you can try max.col which is a little bit faster than apply

max.col(df, "last")
# [1] 2 4 4 2 4

Data

set.seed(1)
df <- data.frame(a=sample(0:1,5,replace=T),b=sample(0:1,5,replace=T),c=sample(0:1,5,replace=T),d=sample(0:1,5,replace=T))

1 Comment

Set a seed and generate the df again, otherwise the results you are showing won't match with anybody else. +1 however
4

Another option is using pmax. We multiply the col(df) by 'df' and get the max value by row.

  do.call(pmax,col(df)*df)
  #[1] 4 2 2 3 1

col(df) is a convenient function to get the column index of the dataset.

  col(df)
  #     [,1] [,2] [,3] [,4]
  #[1,]    1    2    3    4
  #[2,]    1    2    3    4
  #[3,]    1    2    3    4
  #[4,]    1    2    3    4
  #[5,]    1    2    3    4

By doing the multiplication of 'df' with the col(df) of equal dimension, the '0' values will remain 0 while the places that are '1' will be replaced by the column index, i.e.

 col(df)*df
 #  a b c d
 #1 1 0 0 4
 #2 1 2 0 0
 #3 0 2 0 0
 #4 1 0 3 0
 #5 1 0 0 0

Now, we can get the max value per each row by do.call(pmax)

Comments

0

Seeing all the possible solutions and one from my side, here are the times taken by each replicated 10,000 times

apply(df,1,function(x){tail(which(x==1),1)})
user  system elapsed
2.978  0.010  2.988


apply(df*col(df),1,function(x){max(x)})
user  system elapsed
8.217  0.026  8.245



apply(df, 1, function(x) max(which(x == 1)))
user  system elapsed
1.621  0.005  1.627


max.col(df, "last")
user  system elapsed
1.348  0.004  1.352

Though @Mamoun Benghezal's answer is the most efficient, it doesn't solve my purpose of being flexible. The accepted answer does.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.