3

I have a dataset that looks something like this.

df <- tibble::tribble(
  ~name,           ~x,  ~y,              ~z,  
  "N/A",            1,   "testSmith",    -100, 
  "N A",            3,   "NOt available", -99,
  "test Smith",     NA,  "test Smith",    -98,
  "Not Available", -99, "25",             -101,
  "test Smith",    -98, "28",             -1)

I would like to create a new data.table that keeps all the rows the string "test".

The final dataset should look something like this

  name           x y              z
  <chr>      <dbl> <chr>      <dbl>
1 N/A            1 testSmith   -100
2 test Smith    NA test Smith   -98
3 test Smith   -98 28            -1

I could do this column by column like this

setDT(df)[name%like%"test"|y%like%"test"]

The problem with this approach is that I have hundreds of string variables and I would like to find a more compact approach. I tried the followings but they do not work

chvar <- keep(trai,is.character)%>%names()
setDT(df)[chvar%like%"test"]#error
setDT(df)[(chvar)%like%"test"]#error
setDT(df)[.(chvar)%like%"test"]#empty dt

Does someone know how I could do it in a quick and efficient way?

Thanks a lot for your help

2 Answers 2

5

In data.table you can do :

library(data.table)

cols <- c('name', 'y')
setDT(df)

df[df[, Reduce(`|`, lapply(.SD, `%like%`, "test")), .SDcols = cols]]

#         name   x          y    z
#1:        N/A   1  testSmith -100
#2: test Smith  NA test Smith  -98
#3: test Smith -98         28   -1

In base R :

subset(df, Reduce(`|`, lapply(df[cols], function(x) grepl('test', x))))

dplyr :

library(dplyr)
df %>% filter(Reduce(`|`, across(all_of(cols), ~grepl('test', .x))))

lapply/across returns a list of TRUE/FALSE values for all columns. It will return TRUE if 'test' is present and FALSE if it is not present. When we use it in combination with Reduce and | it will give TRUE only of there is atleast one TRUE value in the row. If all the values in the row are FALSE it will returns FALSE. We select only those rows which has at least one TRUE value in it.

Sign up to request clarification or add additional context in comments.

Comments

1

Another data.table option is using grepl

dt[rowSums(dt[, lapply(.SD, grepl, pattern = "test")]) > 0, .SDcols = .(name, y)]

such that

         name   x          y    z
1:        N/A   1  testSmith -100
2: test Smith  NA test Smith  -98
3: test Smith -98         28   -1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.