73

of course I could replace specific arguments like this:

    mydata=c("á","é","ó")
    mydata=gsub("á","a",mydata)
    mydata=gsub("é","e",mydata)
    mydata=gsub("ó","o",mydata)
    mydata

but surely there is a easier way to do this all in onle line, right? I dont find the gsub help to be very comprehensive on this.

6
  • 1
    If you wanted to replace different patterns with the same thing, it should be possible with lapply, but as you want to replace different patterns with different strings, I think you will still have to specified these one way or another... Commented Mar 6, 2013 at 17:33
  • 2
    You might be able to use chartr to do this. Commented Mar 6, 2013 at 17:41
  • 31
    The gsubfn function in the gsubfn package is a generalization of gsub that can do that in one call: gsubfn(".", list("á"="a", "é"="e", "ó"="o"), c("á","é","ó")) Commented Mar 6, 2013 at 20:39
  • @G.Grothendieck. Thats great and also working for all type of characters. Very valuable comment. Thank you! Commented Mar 7, 2013 at 10:16
  • 1
    For people searching for a more general solution to this question, here is a more helpful answer: stackoverflow.com/a/7664655/1036500 Commented Jun 26, 2014 at 13:33

11 Answers 11

84

Use the character translation function

chartr("áéó", "aeo", mydata)
Sign up to request clarification or add additional context in comments.

2 Comments

Thats cool for characters... But does this also work with special characaters e.g. underscores, points, etc... It's not within the question, still would be interesting to know something for this case too...
@Joschi, your question doesn't talk about it. I think you'll have to escape them because they are special characters...
33

An interesting question! I think the simplest option is to devise a special function, something like a "multi" gsub():

mgsub <- function(pattern, replacement, x, ...) {
  if (length(pattern)!=length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result <- x
  for (i in 1:length(pattern)) {
    result <- gsub(pattern[i], replacement[i], result, ...)
  }
  result
}

Which gives me:

> mydata <- c("á","é","ó")
> mgsub(c("á","é","ó"), c("a","e","o"), mydata)
[1] "a" "e" "o"

Comments

29

Maybe this can be usefull:

iconv('áéóÁÉÓçã', to="ASCII//TRANSLIT")
[1] "aeoAEOca"

2 Comments

On the most current version of R that I'm using the call iconv('áéóÁÉÓçã', to="ASCII//TRANSLIT") returns "'a'e'o'A'E'Oc~a". Did the behavior change across R versions, or does this have to do with my default encoding?
@Aaron: Don't know if is an encoding problem. I tried here at R 3.3.1 and worked as expected.
20

You can use stringi package to replace these characters.

> stri_trans_general(c("á","é","ó"), "latin-ascii")

[1] "a" "e" "o"

Comments

11

This is very similar to @kith, but in function form, and with the most common diacritcs cases:

removeDiscritics <- function(string) {
  chartr(
     "ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"
    ,"SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"
    , string
  )
}


removeDiscritics("test áéíóú")

"test aeiou"

Comments

7

Another mgsub implementation using Reduce

mystring = 'This is good'
myrepl = list(c('o', 'a'), c('i', 'n'))

mgsub2 <- function(myrepl, mystring){
  gsub2 <- function(l, x){
   do.call('gsub', list(x = x, pattern = l[1], replacement = l[2]))
  }
  Reduce(gsub2, myrepl, init = mystring, right = T) 
}

Comments

7

A problem with some of the implementations above (e.g., Theodore Lytras's) is that if the patterns are multiple characters, they may conflict in the case that one pattern is a substring of another. A way to solve this is to create a copy of the object and perform the pattern replacement in that copy. This is implemented in my package bayesbio, available on CRAN.

mgsub <- function(pattern, replacement, x, ...) {
  n = length(pattern)
  if (n != length(replacement)) {
    stop("pattern and replacement do not have the same length.")
  }
  result = x
  for (i in 1:n) {
    result[grep(pattern[i], x, ...)] = replacement[i]
  }
  return(result)
}

Here is a test case:

  asdf = c(4, 0, 1, 1, 3, 0, 2, 0, 1, 1)

  res = mgsub(c("0", "1", "2"), c("10", "11", "12"), asdf)

Comments

3

Not so elegant, but it works and does what you want

> diag(sapply(1:length(mydata), function(i, x, y) {
+   gsub(x[i],y[i], x=x)
+ }, x=mydata, y=c('a', 'b', 'c')))
[1] "a" "b" "c"

Comments

3

Related to Justin's answer:

> m <- c("á"="a", "é"="e", "ó"="o")
> m[mydata]
  á   é   ó 
"a" "e" "o" 

And you can get rid of the names with names(*) <- NULL if you want.

Comments

1

You can use the match function. Here match(x, y) returns the index of y where the element of x is matched. Then you can use the returned indices, to subset another vector (say z) that contains the replacements for the values of x, appropriately matched with y. In your case:

mydata <- c("á","é","ó")
desired <- c('a', 'e', 'o')

desired[match(mydata, mydata)]

In a simpler example, consider the situation below, where I was trying to substitute a for 'alpha', 'b' for 'beta' and so forth.

x <- c('a', 'a', 'b', 'c', 'b', 'c', 'e', 'e', 'd')

y <- c('a', 'b', 'c', 'd', 'e')
z <- c('alpha', 'beta', 'gamma', 'delta', 'epsilon')

z[match(x, y)]

Comments

0

You can also combine them with gsub:

mydata <- gsub("á","a", gsub("é","e", gsub("í","i", gsub("ó","o", gsub ("ú", "u", mydata)))))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.