2

I currently have the below vector and am trying to use stringr to find a pattern and update.

string_vector <- c("1_lasso", "_lasso", "1_lasso_olsps", "_lasso_olsps")

string_vector_new <- string_vector %>%
str_replace("^[1_]_lasso", "Lasso")

[1] "Lasso"                 "_lasso"                "Lasso_olsps"          
 [4] "_lasso_olsps"

I'm not sure how, but I am hoping to update my code so that I can detect a pattern like 1_lasso and _lasso, and change them both to Lasso simultaneously. Is this possible using stringr? I'm not sure what regular expressions I would need to make that update and have many more variables like these ones.

Thanks in advance.

3
  • 1
    str_replace_all("1?_lasso", "Lasso"). Do you want to replace a substring, or a whole word? Commented Apr 20, 2021 at 9:29
  • I was hoping to replace the whole word. So if it uniquely contained "_lasso" such as "1_lasso" or "_lasso", replace it with "Lasso". Commented Apr 20, 2021 at 9:33
  • 1
    Then just FYI all the answers below will replace lasso in 1_lassosomewordstartingwithlasso Commented Apr 20, 2021 at 9:34

4 Answers 4

3

simply use the | character to set an "or" in the pattern.

str_replace_all(string_vector, "1_lasso|_lasso", "Lasso")
Sign up to request clarification or add additional context in comments.

Comments

2

I would use sub here with the regex pattern [^_]*_lasso:

string_vector <- c("1_lasso", "_lasso", "1_lasso_olsps", "_lasso_olsps")
output <- sub("[^_]*_lasso", "Lasso", string_vector)
output

[1] "Lasso"       "Lasso"       "Lasso_olsps" "Lasso_olsps"

The pattern used here matches _lasso which may or may not be preceded by some number of non underscore characters.

Comments

1

Use

str_replace_all(string_vector, "\\b1?_lasso(?![^\\W_])", "Lasso")

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  1?                       '1' (optional (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  _lasso                   '_lasso'
--------------------------------------------------------------------------------
  (?!                      look ahead to see if there is not:
--------------------------------------------------------------------------------
    [^\W_]                   any character except: non-word
                             characters (all but a-z, A-Z, 0-9, _),
                             '_'
--------------------------------------------------------------------------------
  )                        end of look-ahead

R code snippet:

library(stringr)
string_vector <- c("1_lasso", "_lasso", "1_lasso_olsps", "_lasso_olsps")
str_replace_all(string_vector, "\\b1?_lasso(?![^\\W_])", "Lasso")

Results: [1] "Lasso" "Lasso" "Lasso_olsps" "Lasso_olsps"

Comments

1

If you like to use regular expression then this my solution.

You can directly tackle it this way.

    string_vector <- c("1_lasso", "_lasso", "1_lasso_olsps", "_lasso_olsps")
    gsub("1_lasso|_lasso","Lasso",string_vector)

[1] "Lasso"       "Lasso"       "Lasso_olsps" "Lasso_olsps"

The code searches for the two patterns and replaces it with "Lasso"

To make it more generalized we can use the below code which looks for any pattern with "something_lasso"

gsub("\\S*_lasso","Lasso",string_vector)
[1] "Lasso"       "Lasso"       "Lasso_olsps" "Lasso_olsps"

The code looks for \S*_lasso where \S* is any non space item, 0 or infinite number of times.

I hope it helps.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.