Subsetting a dataframe in R based on the elements in the columns

Question

I have a dataframe df which has 5 rows and 6 columns.

df <- data.frame(
  Hits = c("Hit1", "Hit2", "Hit3", "Hit4", "Hit5"),
  category1 = c("a1", "", "b1", "a1", "c1"),
  category2 = c("", "", "", "", "a2"),
  category3 = c("a3", "", "b3", "", "a3"),
  category4 = c("", "", "", "", ""),
  category5 = c("", "", "a5", "b5", ""),
  stringsAsFactors = FALSE)

From each of the columns category1 to category5, I need to retain only the elements which appear at the topmost position i.e.

and finally, drop the rows having no elements in these five columns, i.e.

How do I achieve this in the simplest possible way in R?

Ronak Shah · Accepted Answer · 2021-08-09 11:50:21Z

2

You can use -

library(dplyr)

df %>%
  #Retain only the values that appear in topmost position
  mutate(across(starts_with('category'), ~replace(., -match(TRUE, . != ''), ''))) %>%
  #Drop the rows that have no element
  filter(if_any(starts_with('category'), ~. != ''))

#  Hits category1 category2 category3 category4 category5
#1 Hit1        a1                  a3                    
#2 Hit3                                                a5
#3 Hit5                  a2

If you want to do this via position you can do -

df %>%
  mutate(across(2:6, ~replace(., -match(TRUE, . != ''), ''))) %>%
  filter(if_any(2:6, ~. != ''))

edited Aug 9, 2021 at 11:50

answered Aug 9, 2021 at 10:47

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

accibio Over a year ago

Thanks! This works perfectly. I'm wondering if there is a way to tweak your method for cases where the column names could vary.

Ronak Shah Over a year ago

What would be the column names? There are lot of functions in dplyr that you can use to select columns. starts_with, ends_with, matches, contains etc. Also you can select columns also by position 2:5.

accibio Over a year ago

Okay. I guess selecting the columns based on positions will solve the issue of varying column names. Thanks for the clarification.

accibio Over a year ago

Could you please show how to select columns based on their positions? I'm not aware of the syntax

Ronak Shah Over a year ago

What are the position of the columns that you want to select?

|

iago · Accepted Answer · 2021-08-09 11:30:11Z

1

df %>% 
  mutate(across(.cols = -Hits, .fns = ~ifelse(row_number() == first(which(.!="")) | all(. == ""), ., ""))) %>% 
  filter(if_any(-Hits, ~.!=""))

  Hits category1 category2 category3 category4 category5
1 Hit1        a1                  a3                    
2 Hit3                                                a5
3 Hit5                  a2

edited Aug 9, 2021 at 11:30

answered Aug 9, 2021 at 10:54

iago

3,2964 gold badges25 silver badges37 bronze badges

3 Comments

accibio Over a year ago

Thanks for answering. This works but I want empty spaces instead of <NA>. Could you edit your answer accordingly?

accibio Over a year ago

There's still <NA> under category4.

iago Over a year ago

@AbhishekChowdhury Solved!

Collectives™ on Stack Overflow

Subsetting a dataframe in R based on the elements in the columns

2 Answers 2

8 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

8 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related