18

If I have these strings:

mystrings <- c("X2/D2/F4",
               "X10/D9/F4",
               "X3/D22/F4",
               "X9/D22/F9")

How can I extract 2,9,22,22. These characters are between the / and after the first character within the /.

I would like to do this in a vectorized fashion and add the new column with transfrom if possible with which I am familiar.

I think this regex gets me somewhere near all the characters within \:

^.*\\'(.*)'\\.*$
1
  • 1
    +1 for all @Arun gave me the first workable answer. I just don't work with strings enough. Commented Jan 3, 2013 at 20:24

7 Answers 7

29
> gsub("(^.+/[A-Z]+)(\\d+)(/.+$)", "\\2", mystrings)
[1] "2"  "9"  "22" "22"

You would "read" (or "parse") that regex pattern as splitting any matched string into three parts:

1) anything up to and including the first forward slash followed by a sequence of capital letters,

2) any digits(= "\d") in a sequence before the next slash and ,

3) from the next slash to the end.

And then only returning the second part....

Non-matched character strings would be returned unaltered.

Sign up to request clarification or add additional context in comments.

1 Comment

+1 I did not know you could grab the second set of matches with \\2 without a second group! Slick.
20

as.numeric(gsub("^.*D([0-9]+).*$", "\\1", mystrings))

Comments

8

@Arun stole my thunder, so I'm giving my initial long-winded example.

cut.to.pieces <- strsplit(mystrings, split = "/")
got.second <- lapply(cut.to.pieces, "[", 2)
get.numbers <- unlist(got.second)
as.numeric(gsub(pattern = "[[:alpha:]]", replacement = "", x = get.numbers, perl = TRUE))
[1]  2  9 22 22

Comments

8

Using str_extract from the stringr package:

as.numeric(str_extract(mystrings, perl('(?<=/[A-Z])[0-9]+(?=/)')))

1 Comment

@rrs It's part of a look-behind assertion. type ?regex in the R prompt and read the last few paragraphs of the "Perl-like Regular Expressions" section.
4

This ended up being a compacted version of @RomanLuštrik's answer:

gsub("[^0-9]","",sapply(strsplit(mystrings,"/"),"[",2))
[1] "2"  "9"  "22" "22"

Comments

1

Using rex may make this type of task a little simpler.

matches <- re_matches(mystrings,
  rex(
    "/",
    any,
    capture(name = "numbers", digits)
    )
  )

as.numeric(matches$numbers)
#>[1]  2  9 22 22

Comments

0

Using the package unglue you could do :

# install.packages("unglue")
library(unglue)

unglue_vec(mystrings, "{x}/{y}/{z}", var = "y")
#> [1] "D2"  "D9"  "D22" "D22"

From a data frame you could use unglue_unnest() so no need to use transform()

df <- data.frame(col = mystrings)
unglue_unnest(df, col, "{x}/{y}/{z}", remove = FALSE)
#>         col   x   y  z
#> 1  X2/D2/F4  X2  D2 F4
#> 2 X10/D9/F4 X10  D9 F4
#> 3 X3/D22/F4  X3 D22 F4
#> 4 X9/D22/F9  X9 D22 F9

# or used unnamed subpatterns to keep only the middle value
unglue_unnest(df, col, "{=.*?}/{y}/{=.*?}", remove = FALSE)
#>         col   y
#> 1  X2/D2/F4  D2
#> 2 X10/D9/F4  D9
#> 3 X3/D22/F4 D22
#> 4 X9/D22/F9 D22

Created on 2019-11-06 by the reprex package (v0.3.0)

More info: https://github.com/moodymudskipper/unglue/blob/master/README.md

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.