10

I would like to use str_extract in the stringr package to extract the numbers from strings in the form XX nights etcetc.

I'm currently doing this:

library(stringr)

str_extract("17 nights$5 Days", "(\\d)+ nights")

but that returns

"17 nights"

instead of 17.

How can I extract just the number? I thought specifying the extract group with parentheses would work, but it doesn't.

5 Answers 5

14

You can use the look ahead regular express (?=)

library(stringr)

str_extract("17 nights$5 Days", "(\\d)+(?= nights)")

(\d) - a digit
(\d)+ - one or more digits
(?= nights) - that comes in front of " nights"

The look behind (?<=) can also come in handy.

A good reference cheatsheet is from Rstudio's website: https://raw.githubusercontent.com/rstudio/cheatsheets/main/regex.pdf

Sign up to request clarification or add additional context in comments.

2 Comments

better if it's (\\d+) or \\d+
Remove space after "followed by" look around operator "?="; regex won't work otherwise.
5

In base R, we can use sub to extract number which comes before "nights"

as.integer(sub("(\\d+)\\s+nights.*", "\\1","17 nights$5 Days"))
#[1] 17

Or if the number is always the first number in the string we can use readr::parse_number

readr::parse_number("17 nights$5 Days")
#[1] 17

2 Comments

I just ran a check and using base R as.integer(sub("(\\d+)\\s+nights.*", "\\1","17 nights$5 Days")) (even with the as.integer) is about 3X faster than the equivalent stringr
Thanks for for readr::parse_number -- this is quite foolproof
4

If you want to specify a specific group for return, use str_replace(). The pattern you want to capture is wrapped in (), then in the replacement argument you refer to that group as "\\1" as it is capture group number one.

I added the ^ to indicate you want numbers only at the beginning of the string.


library(stringer)

str_replace(string = "17 nights$5 Days",
            pattern = "(^\\d+).*",
            replacement = "\\1")

giving:

[1] "17"

2 Comments

This is much slower (X2+) than just using extract.
@Moohan No doubt. I put it here because the original poster wanted to use numbers to refer to groups, which I don't think you can do in extract.
2

You can use stringr::str_match which returns all of the matched groups as a matrix then select the correct column.

library(stringr)

str_match("17 nights$5 Days", "(\\d+?) nights")[[2]]

Comments

0

Using rebus. If the string always start with a number:

library(stringr)
library(rebus)

pattern = START %R% one_or_more(DGT)
str_extract("17 nights$5 Days", pattern)
#> [1] "17"

Created on 2021-05-30 by the reprex package (v2.0.0)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.