Compare string values with multiple logical columns of dataframe

Question

I have a dataframe with the following fields

class         bed_1  chair_1  bottle_1  table_1      ...

bed          TRUE    FALSE   FALSE      FALSE        ...
chair        FALSE   TRUE    FALSE      FALSE        ...
sofa         FALSE   FALSE   TRUE       FALSE        ...
table        FALSE   FALSE   FALSE      FALSE        ...

I want to compare class column with all other columns. The following is the expected output

class     new_col
bed       bed_1
chair     chair_1
bottle    bottle_1
table

So, essentially, I need to pickup column name with TRUE value for specific class.

The solution i tried takes long time due to large number of records, I am looking for an efficient way of doing this. Here is my solution.

new_df <- data.frame(class = df$class, new_col = NA)
for (row_n in 1:length(df$class) ){
indx <- which (df[row_n, ] == 'TRUE')
new_df$new_col[row_n] <- ifelse (length(indx) > 0, colnames(df)[idx], '') 
}

Rui Barradas · Accepted Answer · 2021-10-07 07:30:09Z

1

Here is a base R way that returns the empty string "" in case all values in a row are FALSE.

y <- apply(df1[-1], 1, \(i, x = names(df1[-1])) {
  y <- x[i]
  if(length(y)) y[1] else ""
})
df2 <- data.frame(class = df1$class, new_col = y)

df2
#  class  new_col
#1   bed    bed_1
#2 chair  chair_1
#3  sofa bottle_1
#4 table

Data

df1 <-
structure(list(class = c("bed", "chair", "sofa", "table"), 
bed_1 = c(TRUE, FALSE, FALSE, FALSE), 
chair_1 = c(FALSE, TRUE, FALSE, FALSE), 
bottle_1 = c(FALSE, FALSE, TRUE, FALSE), 
table_1 = c(FALSE, FALSE, FALSE, FALSE)), 
class = "data.frame", row.names = c(NA, -4L))

answered Oct 7, 2021 at 7:30

Rui Barradas

78k8 gold badges41 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Vendetta Over a year ago

Hi Rui, is '\' a typo in your code?

Rui Barradas Over a year ago

@Vendetta No, it's not. It's the new lambda functions introduced in R 4.1.0. If your version of R is older, use the older lambda, function(i, x = names(df1[-1])).

Ronak Shah · Accepted Answer · 2021-10-07 10:18:50Z

1

Get the data in long format and filter the TRUE values.

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = -class) %>%
  filter(value) %>%
  select(-value)

#  class name    
#  <chr> <chr>   
#1 bed   bed_1   
#2 chair chair_1 
#3 sofa  bottle_1

If you have one TRUE value in each row you can use max.col.

new_df <- cbind(df[1], new_col = names(df[-1])[max.col(df[-1])])

To extract only one TRUE value you may do -

df %>%
  pivot_longer(cols = -class) %>%
  group_by(class) %>%
  slice(match(TRUE, value)) %>%
  select(-value)

edited Oct 7, 2021 at 10:18

answered Oct 7, 2021 at 7:15

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

2 Comments

Vendetta Over a year ago

Hi Ronak, thank you for solution, there could be more than TRUE, how do I pick the first true only and in case no TRUE found leave the cell empty.

Ronak Shah Over a year ago

The updated answer should help you with that @Vendetta

Collectives™ on Stack Overflow

Compare string values with multiple logical columns of dataframe

2 Answers 2

Data

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Data

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related