I am trying to extract part of a column in a data frame using regular expressions. Problems I am running into include the facts that grep returns the whole value, not just the matched part, and that str_extract doesn't seem to work in a vectorized way.
Here is what I'm trying. I would like df$match to show alpha.alpha. where the pattern exists and NA otherwise. How can I show only the matched part?
Also, how I can I replace [a-zA-Z] in R regex? Can I use a character class or a POSIX code like [:alpha:]?
v1 <- c(1:4)
v2 <- c("_a.b._", NA, "_C.D._", "_ef_")
df <- data.frame(v1, v2, stringsAsFactors = FALSE)
df$match <- grepl("[a-zA-Z]\\.[a-zA-Z]\\.", df$v2)
df$match
#TRUE FALSE TRUE FALSE
v2grep <- grep("[a-zA-Z]\\.[a-zA-Z]\\.", df$v2, value = TRUE)
df$match[df$match == TRUE] <- v2grep
df$match[df$match == FALSE] <- NA
df
#v1 v2 match
#1 _a.b._ _a.b._
#2 <NA> <NA>
#3 _C.D._ _C.D._
#4 _ef_ <NA>
What I want:
#v1 v2 match
#1 _a.b._ a.b.
#2 <NA> <NA>
#3 _C.D._ C.D.
#4 _ef_ <NA>