0

I have strings like this: "X96HE6.10nMBI_1_2", "X96HE6.10nMBI_2_2", "X96HE6.10nMBI_3_2" and I would like to match only numbers 1, 2 and 3 in between underscores but without them(underscores). The best solution I could come up with is this str_match(sample_names, "_+[1-3]?") I would really appreciate the help.

4
  • str_match(sample_names,"(?<=_)\\d+(?=_)") Commented Jun 14, 2020 at 19:18
  • @Onyambu Thank you very much! :) Just a question, do you maybe know some good tutorial or some exercises on web, for practicing regex in R? Commented Jun 14, 2020 at 19:22
  • Do you wish to match only the digits '1', '2' or '3', and only when they are surrounded by underscores, or match any single digit surrounded by underscores or match any string of digits surrounded by underscores? Please edit to clarify. Commented Jun 15, 2020 at 3:35
  • Apologies @Cary Swoveland, I have added "only" to my question, I was interested only in digits '1', '2', '3' between two underscores. But I already got my answer, sorry if I caused you any inconvenience. Commented Jun 18, 2020 at 12:44

4 Answers 4

2

The simplest method is by using suband backreference:

Data:

d <- c("X96HE6.10nMBI_1_2", "X96HE6.10nMBI_2_2", "X96HE6.10nMBI_3_2")

Solution:

sub(".*_(\\d)_.*", "\\1", d)

Here, (\\d) defines the capturing group for a single number (if the number in question can be more than one digit, use \\d+) that is 'recalled' by the backreference \\1in subs replacement argument

Alternatively use str_extract and positive lookaround:

library(stringr)
str_extract(d, "(?<=_)\\d(?=_)")

(?<=_) is positive lookbehind which can be glossed as "If you see _ on the left..."

\\d is the number to be matched

(?=_) is positive lookahead, which can be glossed as "If you see _ on the right..."

Result:

[1] "1" "2" "3"
Sign up to request clarification or add additional context in comments.

Comments

1

You can use Look Arounds, I personally rely heavily on the stringr Cheatsheets for these kind of regex, the syntax is a bit hard to remember, here is the rstudio page for Cheatsheets look for stringr ->LOOK AROUNDS

library(tidyverse)

codes <- c("X96HE6.10nMBI_1_2", "X96HE6.10nMBI_2_2", "X96HE6.10nMBI_3_2")

codes %>%
  str_extract("(?<=_)[:digit:]+(?=_)")
#> [1] "1" "2" "3"

Created on 2020-06-14 by the reprex package (v0.3.0)

1 Comment

Thank you very much for the advice! :)
1

Using x in the Note at the end, read it in using read.table and pick off the second field. No packages or regular expressions are used.

read.table(text = x, sep = "_")[[2]]
## [1] 1 2 3

Note

x <- c("X96HE6.10nMBI_1_2", "X96HE6.10nMBI_2_2", "X96HE6.10nMBI_3_2")

Comments

1

No need for any third-party module:

strings <- c("X96HE6.10nMBI_1_2", "X96HE6.10nMBI_2_2", "X96HE6.10nMBI_3_2")
pattern <- "(?<=_)(\\d+)(?=_)"

unlist(regmatches(strings, gregexpr(pattern, strings, perl = TRUE)))

Which yields:

[1] "1" "2" "3"

2 Comments

That's not the expected output of the OP: "I would like to match numbers 1, 2 and 3 in between underscores"
(?!$)(?=_) = (?=_) because _ is not the end of the string.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.