1

I have columns in a dataframe where I want to replace integers with their corresponding string values. The integers are often repeating in cells (separated by spaces, commas, /, or - etc.). For example my dataframe column is:

> df = data.frame(c1=c(1,2,3,23,c('11,21'),c('13-23')))
> df

     c1
1     1
2     2
3     3
4    23
5 11,21
6 13-23

I have used both str_replace_all() and str_replace() methods but did not get the desired results.

> df[,1] %>% str_replace_all(c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g"))

[1] "a"     "b"     "c"     "bc"    "aa,ba" "ac-bc"
> df[,1] %>% str_replace(c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g"))

Error in fix_replacement(replacement) : argument "replacement" is missing, with no default

The desired result would be:

[1] "a"     "b"     "c"     "g"    "d,f" "e-g"

As there are multiple values to replace that's why my first choice was str_replace_all() as it allows to have a vector with the original column values and desired replacement values but the method fails due to regex. Am I doing it wrong or is there any better alternative to solve my problem?

2 Answers 2

3

Simply place the longest multi-character at the beginning like:

library(stringr)

str_replace_all(df[,1], 
 c("11"="d","13"="e","21"="f","23"="g","1"="a","2"="b","3"="c"))
#[1] "a"   "b"   "c"   "g"   "d,f" "e-g"

and for complexer cases:

x <- c("1"="a","2"="b","3"="c","11"="d","13"="e","21"="f","23"="g")
x <- x[order(nchar(names(x)), decreasing = TRUE)]
str_replace_all(df[,1], x)
#[1] "a"   "b"   "c"   "g"   "d,f" "e-g"
Sign up to request clarification or add additional context in comments.

2 Comments

This works on example of my dataframe. In the original dataframe, I have multi-characters cells repeated and there are 100s of such cases.
I have added a solution for complexer cases.
1

Using the ordering method in @GKi's answer, here's a base R version using Reduce/gsub instead of stringr::str_replace_all

Starting vector

x <- as.character(df$c1)

Ordering as in @GKi answer

repl_dict <- c("11"="d","13"="e","21"="f","23"="g","1"="a","2"="b","3"="c")
repl_dict <- repl_dict[order(nchar(names(repl_dict)), decreasing = TRUE)]

Replacement

Reduce(
  function(x, n) gsub(n, repl_dict[n], x, fixed = TRUE),
  names(repl_dict),
  init = x)

#  [1] "a"   "b"   "c"   "g"   "d,f" "e-g"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.