0

I have a dataframe which has 5 rows and 6 columns.

df <- data.frame(
  Hits = c("Hit1", "Hit2", "Hit3", "Hit4", "Hit5"),
  category1 = c("a1", "", "b1", "a1", "c1"),
  category2 = c("", "", "", "", "a2"),
  category3 = c("a3", "", "b3", "", "a3"),
  category4 = c("", "", "", "", ""),
  category5 = c("", "", "a5", "b5", ""),
  stringsAsFactors = FALSE)

enter image description here

From each of the columns to , I need to retain only the elements which appear at the topmost position i.e.

enter image description here

and finally, drop the rows having no elements in these five columns, i.e.

enter image description here

How do I achieve this in the simplest possible way in R?

2 Answers 2

2

You can use -

library(dplyr)

df %>%
  #Retain only the values that appear in topmost position
  mutate(across(starts_with('category'), ~replace(., -match(TRUE, . != ''), ''))) %>%
  #Drop the rows that have no element
  filter(if_any(starts_with('category'), ~. != ''))

#  Hits category1 category2 category3 category4 category5
#1 Hit1        a1                  a3                    
#2 Hit3                                                a5
#3 Hit5                  a2                              

If you want to do this via position you can do -

df %>%
  mutate(across(2:6, ~replace(., -match(TRUE, . != ''), ''))) %>%
  filter(if_any(2:6, ~. != ''))
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks! This works perfectly. I'm wondering if there is a way to tweak your method for cases where the column names could vary.
What would be the column names? There are lot of functions in dplyr that you can use to select columns. starts_with, ends_with, matches, contains etc. Also you can select columns also by position 2:5.
Okay. I guess selecting the columns based on positions will solve the issue of varying column names. Thanks for the clarification.
Could you please show how to select columns based on their positions? I'm not aware of the syntax
What are the position of the columns that you want to select?
|
1
df %>% 
  mutate(across(.cols = -Hits, .fns = ~ifelse(row_number() == first(which(.!="")) | all(. == ""), ., ""))) %>% 
  filter(if_any(-Hits, ~.!=""))

  Hits category1 category2 category3 category4 category5
1 Hit1        a1                  a3                    
2 Hit3                                                a5
3 Hit5                  a2                              

3 Comments

Thanks for answering. This works but I want empty spaces instead of <NA>. Could you edit your answer accordingly?
There's still <NA> under category4.
@AbhishekChowdhury Solved!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.