7

I have the following tibble:


library(tidyverse)

df <- tibble::tribble(
  ~sample, ~colB, ~colC,
  "foo",   1,  2,
  "bar_x",   2,  3,
  "qux.6hr.ID",   3,  4,
  "dog",   1,  1
)


df
#> # A tibble: 4 x 3
#>       sample  colB  colC
#>        <chr> <dbl> <dbl>
#> 1        foo     1     2
#> 2      bar_x     2     3
#> 3 qux.6hr.ID     3     4
#> 4        dog     1     1

df <- factor(final_df$samples, levels=c("bar_x","foo","qux.6hr.ID","dog"))

    df
#> [1] foo        bar_x      qux.6hr.ID dog       
#> Levels: bar_x foo qux.6hr.ID dog

What I want to do is for every row in sample column remove these substrings: _x and .6hr if exist. The final table looks like this:

     sample  colB  colC
        foo     1     2
        bar     2     3
     qux.ID     3     4
        dog     1     1

How can I achieve that?

3
  • 2
    df %>% mutate(sample = gsub('_x|\\.6hr', '', sample)) or equivalently with stringr, df %>% mutate(sample = str_replace_all(sample, '_x|\\.6hr', '')) Commented Jun 3, 2017 at 5:08
  • @alistaire Actually my df contain factor. See my update. Sorry. How can I modify your code? Commented Jun 3, 2017 at 5:17
  • 1
    gsub still works, though it coerces to character. You could make a call to levels<-, but it's a little awkward in dplyr syntax. The forcats package supplies an alternative: df %>% mutate(sample = factor(sample), sample = forcats::fct_relabel(sample, function(x){str_replace_all(x, '_x|\\.6hr', '')})) though you have to structure the second parameter as a function à la lapply. Commented Jun 3, 2017 at 5:33

2 Answers 2

10

We can use

df %>% 
     mutate(sample = gsub("_x|\\.\\d+[A-Za-z]+", "", sample))
# A tibble: 4 x 3 
#   sample  colB  colC
#    <chr> <dbl> <dbl>
#1    foo     1     2
#2    bar     2     3
#3 qux.ID     3     4
#4    dog     1     1

If the 'sample' column is factor class either we can wrap with factor on the output of gsub or do this on the levels of sample

levels(df$sample) <- gsub("_x|\\.\\d+[A-Za-z]+", "", levels(df$sample))
df$sample
#[1] foo    bar    qux.ID dog   
#Levels: bar foo qux.ID dog
Sign up to request clarification or add additional context in comments.

2 Comments

Actually my df contains factor. See my update. Sorry. How can I modify your code?
@pdubois gsub will take factor as well. If you retain as factor, then wrap the output with factor i.e. mutate(sample = factor(gsub(..
2

And here's a solution using the purrr:map function, which has the added benefit of returning the same result whether "sample" is chr or factor.

df %>%
   mutate(sample = map_chr(sample, ~str_replace(.x, 
                                         pattern = "_x|\\.\\d+[A-Za-z]+", 
                                         replacement = "")))
# A tibble: 4 x 3
#  sample  colB  colC
#  <chr>  <dbl> <dbl>
#1 foo        1     2
#2 bar        2     3
#3 qux.ID     3     4
#4 dog        1     1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.