4

I have a data frame with two string variables with an equal number of characters. These strings represent a student responses for some exam. The first string contains a + sign for each question answered correctly and the incorrect response for each incorrect item. The second string contains all the correct answers. I want to replace all the + signs in the first string with the correct answer from the second string. A simplified heuristic data set can be created with this code:

df <- data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"), 
                 v2 = c("DBBAD", "BDCAD","CDCCA"), stringsAsFactors = FALSE)

So the + signs in df$v1 need to be replaced w/ the letters in df$v2 that are the same distance from the start of the string. Any ideas?

5 Answers 5

10

When df$v1 and df$v2 are characters we may use

regmatches(df$v1, gregexpr("\\+", df$v1)) <- regmatches(df$v2, gregexpr("\\+", df$v1))

That is,

df <- data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"), 
                 v2 = c("DBBAD", "BDCAD", "CDCCA"), 
                 stringsAsFactors = FALSE)
rg <- gregexpr("\\+", df$v1)
regmatches(df$v1, rg) <- regmatches(df$v2, rg)
df
#      v1    v2
# 1 DAAAB DBBAD
# 2 DDCCC BDCAD
# 3 ADBAD CDCCA

rg contains the positions of "+" in df$v1, and we conveniently exploit regmatches to replace those matches in df$v1 with whatever is in df$v2 at the same positions.

Sign up to request clarification or add additional context in comments.

1 Comment

Awesome. This works great. I will edit my code to prevent it from creating factors.
3

This one seems valid, too:

mapply(function(x, y) paste0(ifelse(x == "+", y, x), collapse = ""), 
                 strsplit(as.character(df$v1), ""), strsplit(as.character(df$v2), ""))
#[1] "DAAAB" "DDCCC" "ADBAD"

1 Comment

great use of ifelse
2

Based on Tyler Rinker's answer, conceptually it's the same, but using just one lapply and ifelse.

> dats <- lapply(df, function(x) do.call(rbind, strsplit(as.character(x), "")))
> apply(with(dats, ifelse(v1=="+", v2, v1)), 1, paste0, collapse="")
[1] "DAAAB" "DDCCC" "ADBAD"

Comments

2

Most likely there's a better approach but here's on where I make the two columns into matrices and then a lookup key:

## df<-data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"), v2 = c("DBBAD", "BDCAD","CDCCA"))
dats <- lapply(df, function(x) do.call(rbind, strsplit(as.character(x), "")))

dats[[1]][dats[[1]] == "+"] <- dats[[2]][dats[[1]] == "+"]

apply(dats[[1]], 1, paste, collapse = "")
## [1] "DAAAB" "DDCCC" "ADBAD"

I thought this one may be an interesting one to benchmark:

Unit: microseconds
     expr     min      lq  median       uq      max neval
 Andrea() 296.693 313.953 321.884 328.4155 2443.051  1000
   Josh() 300.891 314.420 319.551 326.5500 3748.779  1000
  Tyler() 144.148 155.344 159.543 164.2080 2233.593  1000
 Jibler() 174.937 188.932 193.597 198.7290 2269.514  1000
 Alexis() 154.877 167.007 171.672 175.4040 2342.753  1000
 Julius() 394.658 413.317 420.315 429.4120 2549.412  1000

enter image description here

2 Comments

Why three lapplys? you can get dats using just one: lapply(df, function(x) do.call(rbind, strsplit(as.character(x), ""))).
@Jiber, I was tinkering and put a tinker answer up. I'll clean it up.
1
df<-data.frame(v1 = c("+AA+B", "D++CC", "A+BAD"), 
               v2 = c("DBBAD", "BDCAD","CDCCA"),
               stringsAsFactors = F)


f <- function(x , y){
  xs <- unlist(strsplit(x, split = ""))
  ys <- unlist(strsplit(y, split = ""))
  paste(ifelse(xs == "+", ys , xs), collapse = "")
}

vapply(df$v1, f , df$v2, FUN.VALUE = character(1))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.