0

I have problems solving this issue. Let's assume a dataframe like this:

COL_1 COL_2           COL_3  COL_4
1     UP_RED_LIGHT    23.43  UP_R
2     UP_YELLOW_LIGHT 23.33  UP_Y
3     DP_GREEN_DARK   43.76  DP_G
4     DP_BROWN_LIGHT  45.65  DP_B
5     R_BLACK_DARK    12.32  R_B

I want to catch every string in this dataframe that starts with "DP_" and delete it from the string.

The result I want to have:

COL_1 COL_2           COL_3  COL_4
1     UP_RED_LIGHT    23.43  UP_R
2     UP_YELLOW_LIGHT 23.33  UP_Y
3     GREEN_DARK      43.76  G
4     BROWN_LIGHT     45.65  B
5     R_BLACK_DARK    12.32  R_B

So basically, I want to replace with '' whenever a string in my dataframe starts with DP_, in every column. The fact that starts is important, if it was in the middle of the string the solution should leave it. This is why solution like this:

df<- gsub('DP_', '', df)

don't work for me.

Is there a nice and clean solution to this?

Thank you in advance for the help.

2 Answers 2

1

Your use of sub is almost correct, except that you only want to remove DP_ at the beginning of the string, and also, you only want to do this to the COL_2 column:

df$COL_2 <- sub("^DP_", "", df$COL_2)

To do this replacement on one or more columns, e.g. on COL_2 and COL_4, we can try:

cols <- c("COL_2", "COL_4")
df[cols] <- lapply(df[cols], function(x) sub("^DP_", "", x))
Sign up to request clarification or add additional context in comments.

2 Comments

Since my columns are dynamically created, I can't actually tell which one will be. And as in the example COL_4 is changed too. 'df<- gsub('^DP_', '', df)' would this work for all columns?
@Luigi I have also given you an option which would work for multiple columns.
1

You can also use mutate_at and str_replace to get the desired output.

library(dplyr)
library(stringr)
df %>% 
    mutate_at(vars("COL_2", "COL_4"), ~ str_replace(., "DP_", ""))
 

#  COL_1           COL_2 COL_3 COL_4
#1     1    UP_RED_LIGHT 23.43  UP_R
#2     2 UP_YELLOW_LIGHT 23.33  UP_Y
#3     3      GREEN_DARK 43.76     G
#4     4     BROWN_LIGHT 45.65     B
#5     5    R_BLACK_DARK 12.32   R_B 

Data

df <- data.frame(COL_1 = c(1L:5L), COL_2 = c("UP_RED_LIGHT","UP_YELLOW_LIGHT", "DP_GREEN_DARK",
                "DP_BROWN_LIGHT","R_BLACK_DARK"), COL_3 = c(23.43,23.33,43.76,45.65,12.32),
                COL_4 = c("UP_R", "UP_Y", "DP_G", "DP_B", "R_B"))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.