Replace a given character in a string variable with a character from another string variable of equal length

Question

I have a data frame with two string variables with an equal number of characters. These strings represent a student responses for some exam. The first string contains a + sign for each question answered correctly and the incorrect response for each incorrect item. The second string contains all the correct answers. I want to replace all the + signs in the first string with the correct answer from the second string. A simplified heuristic data set can be created with this code:

df <- data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"), 
                 v2 = c("DBBAD", "BDCAD","CDCCA"), stringsAsFactors = FALSE)

So the + signs in df$v1 need to be replaced w/ the letters in df$v2 that are the same distance from the start of the string. Any ideas?

Julius Vainora · Accepted Answer · 2019-02-05 19:13:51Z

10

When df$v1 and df$v2 are characters we may use

regmatches(df$v1, gregexpr("\\+", df$v1)) <- regmatches(df$v2, gregexpr("\\+", df$v1))

That is,

df <- data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"), 
                 v2 = c("DBBAD", "BDCAD", "CDCCA"), 
                 stringsAsFactors = FALSE)
rg <- gregexpr("\\+", df$v1)
regmatches(df$v1, rg) <- regmatches(df$v2, rg)
df
#      v1    v2
# 1 DAAAB DBBAD
# 2 DDCCC BDCAD
# 3 ADBAD CDCCA

rg contains the positions of "+" in df$v1, and we conveniently exploit regmatches to replace those matches in df$v1 with whatever is in df$v2 at the same positions.

edited Feb 5, 2019 at 19:13

answered Dec 17, 2013 at 22:23

Julius Vainora

48.4k9 gold badges95 silver badges108 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Braden Over a year ago

Awesome. This works great. I will edit my code to prevent it from creating factors.

alexis_laz · Accepted Answer · 2013-12-17 21:59:32Z

3

This one seems valid, too:

mapply(function(x, y) paste0(ifelse(x == "+", y, x), collapse = ""), 
                 strsplit(as.character(df$v1), ""), strsplit(as.character(df$v2), ""))
#[1] "DAAAB" "DDCCC" "ADBAD"

answered Dec 17, 2013 at 21:59

alexis_laz

13.2k4 gold badges29 silver badges37 bronze badges

1 Comment

Ricardo Saporta Over a year ago

great use of ifelse

Jilber Urbina · Accepted Answer · 2013-12-17 22:22:27Z

2

Based on Tyler Rinker's answer, conceptually it's the same, but using just one lapply and ifelse.

> dats <- lapply(df, function(x) do.call(rbind, strsplit(as.character(x), "")))
> apply(with(dats, ifelse(v1=="+", v2, v1)), 1, paste0, collapse="")
[1] "DAAAB" "DDCCC" "ADBAD"

answered Dec 17, 2013 at 22:22

Jilber Urbina

61.4k10 gold badges116 silver badges141 bronze badges

Comments

Tyler Rinker · Accepted Answer · 2013-12-17 22:45:43Z

2

Most likely there's a better approach but here's on where I make the two columns into matrices and then a lookup key:

## df<-data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"), v2 = c("DBBAD", "BDCAD","CDCCA"))
dats <- lapply(df, function(x) do.call(rbind, strsplit(as.character(x), "")))

dats[[1]][dats[[1]] == "+"] <- dats[[2]][dats[[1]] == "+"]

apply(dats[[1]], 1, paste, collapse = "")
## [1] "DAAAB" "DDCCC" "ADBAD"

I thought this one may be an interesting one to benchmark:

Unit: microseconds
     expr     min      lq  median       uq      max neval
 Andrea() 296.693 313.953 321.884 328.4155 2443.051  1000
   Josh() 300.891 314.420 319.551 326.5500 3748.779  1000
  Tyler() 144.148 155.344 159.543 164.2080 2233.593  1000
 Jibler() 174.937 188.932 193.597 198.7290 2269.514  1000
 Alexis() 154.877 167.007 171.672 175.4040 2342.753  1000
 Julius() 394.658 413.317 420.315 429.4120 2549.412  1000

enter image description here

edited Dec 17, 2013 at 22:45

answered Dec 17, 2013 at 21:40

Tyler Rinker

111k74 gold badges335 silver badges536 bronze badges

2 Comments

Jilber Urbina Over a year ago

Why three lapplys? you can get dats using just one: lapply(df, function(x) do.call(rbind, strsplit(as.character(x), ""))).

Tyler Rinker Over a year ago

@Jiber, I was tinkering and put a tinker answer up. I'll clean it up.

Andrea · Accepted Answer · 2013-12-17 22:02:51Z

1

df<-data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"), 
               v2 = c("DBBAD", "BDCAD","CDCCA"),
               stringsAsFactors = F)


f <- function(x , y){
  xs <- unlist(strsplit(x, split = ""))
  ys <- unlist(strsplit(y, split = ""))
  paste(ifelse(xs == "+", ys , xs), collapse = "")
}

vapply(df$v1, f , df$v2, FUN.VALUE = character(1))

answered Dec 17, 2013 at 22:02

Andrea

7233 silver badges9 bronze badges

Collectives™ on Stack Overflow

Replace a given character in a string variable with a character from another string variable of equal length

5 Answers 5

1 Comment

1 Comment

Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

1 Comment

Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related