Regex in R, matching strings

Question

I have strings like this: "X96HE6.10nMBI_1_2", "X96HE6.10nMBI_2_2", "X96HE6.10nMBI_3_2" and I would like to match only numbers 1, 2 and 3 in between underscores but without them(underscores). The best solution I could come up with is this str_match(sample_names, "_+[1-3]?") I would really appreciate the help.

@Onyambu Thank you very much! :) Just a question, do you maybe know some good tutorial or some exercises on web, for practicing regex in R? — Milan91
– Milan91, Commented Jun 14, 2020 at 19:22
Do you wish to match only the digits '1', '2' or '3', and only when they are surrounded by underscores, or match any single digit surrounded by underscores or match any string of digits surrounded by underscores? Please edit to clarify. — Cary Swoveland
– Cary Swoveland, Commented Jun 15, 2020 at 3:35
Apologies @Cary Swoveland, I have added "only" to my question, I was interested only in digits '1', '2', '3' between two underscores. But I already got my answer, sorry if I caused you any inconvenience. — Milan91
– Milan91, Commented Jun 18, 2020 at 12:44

Chris Ruehlemann · Accepted Answer · 2020-06-14 20:21:29Z

2

The simplest method is by using suband backreference:

Data:

d <- c("X96HE6.10nMBI_1_2", "X96HE6.10nMBI_2_2", "X96HE6.10nMBI_3_2")

Solution:

sub(".*_(\\d)_.*", "\\1", d)

Here, (\\d) defines the capturing group for a single number (if the number in question can be more than one digit, use \\d+) that is 'recalled' by the backreference \\1in subs replacement argument

Alternatively use str_extract and positive lookaround:

library(stringr)
str_extract(d, "(?<=_)\\d(?=_)")

(?<=_) is positive lookbehind which can be glossed as "If you see _ on the left..."

\\d is the number to be matched

(?=_) is positive lookahead, which can be glossed as "If you see _ on the right..."

Result:

[1] "1" "2" "3"

edited Jun 14, 2020 at 20:21

answered Jun 14, 2020 at 19:42

Chris Ruehlemann

21.5k4 gold badges15 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Bruno · Accepted Answer · 2020-06-14 19:20:22Z

1

You can use Look Arounds, I personally rely heavily on the stringr Cheatsheets for these kind of regex, the syntax is a bit hard to remember, here is the rstudio page for Cheatsheets look for stringr ->LOOK AROUNDS

library(tidyverse)

codes <- c("X96HE6.10nMBI_1_2", "X96HE6.10nMBI_2_2", "X96HE6.10nMBI_3_2")

codes %>%
  str_extract("(?<=_)[:digit:]+(?=_)")
#> [1] "1" "2" "3"

^{Created on 2020-06-14 by the reprex package (v0.3.0)}

answered Jun 14, 2020 at 19:20

Bruno

4,1801 gold badge13 silver badges29 bronze badges

1 Comment

Milan91 Over a year ago

Thank you very much for the advice! :)

G. Grothendieck · Accepted Answer · 2020-06-14 19:42:43Z

1

Using x in the Note at the end, read it in using read.table and pick off the second field. No packages or regular expressions are used.

read.table(text = x, sep = "_")[[2]]
## [1] 1 2 3

Note

x <- c("X96HE6.10nMBI_1_2", "X96HE6.10nMBI_2_2", "X96HE6.10nMBI_3_2")

answered Jun 14, 2020 at 19:42

G. Grothendieck

273k18 gold badges221 silver badges365 bronze badges

Comments

Jan · Accepted Answer · 2020-06-14 20:11:40Z

1

No need for any third-party module:

strings <- c("X96HE6.10nMBI_1_2", "X96HE6.10nMBI_2_2", "X96HE6.10nMBI_3_2")
pattern <- "(?<=_)(\\d+)(?=_)"

unlist(regmatches(strings, gregexpr(pattern, strings, perl = TRUE)))

Which yields:

[1] "1" "2" "3"

edited Jun 14, 2020 at 20:11

answered Jun 14, 2020 at 19:30

Jan

43.3k11 gold badges57 silver badges87 bronze badges

2 Comments

Chris Ruehlemann Over a year ago

That's not the expected output of the OP: "I would like to match numbers 1, 2 and 3 in between underscores"

Wiktor Stribiżew Over a year ago

(?!$)(?=_) = (?=_) because _ is not the end of the string.

Collectives™ on Stack Overflow

Regex in R, matching strings

4 Answers 4

Comments

1 Comment

Note

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Note

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related