1

I have a model formula (as string) and want to extract the value for a specific argument, id in my case. Now I have found a way that returns the string without the needed string value. I want exactly the opposite, I only want the string value that is missing in my result:

xx <- "gee(formula = breaks ~ tension, id = wool, data = warpbreaks)"
sub("(?=(id=|id =))([a-zA-Z].*)(?=,)", "\\1", xx, perl =T)
#> [1] "gee(formula = breaks ~ tension, id =, data = warpbreaks)"

wool is missing in the return value, but I only want to have wool as resulting string... Can anyone help me finding the correct regex pattern?

2
  • This will do it: sub(".*id ?= ?(.*?),.*", "\\1", xx). You need to match the whole string. Commented Feb 10, 2019 at 18:11
  • Works like a charme, thanks a lot! Commented Feb 10, 2019 at 18:17

2 Answers 2

3

Instead of regex here, you could parse() the string and grab the id argument by name.

as.character(parse(text = xx)[[1]]$id)
# [1] "wool"
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this is also an elegant solution!
1

You may use

xx <- "gee(formula = breaks ~ tension, id = wool, data = warpbreaks)"
sub(".*\\bid\\s*=\\s*(\\w+).*", "\\1", xx)
## or, if the value extracted may contain any chars but commas
sub(".*\\bid\\s*=\\s*([^,]+).*", "\\1", xx)

See the R demo and the regex demo.

Details

  • .* - any 0+ chars, as many as possible
  • \\bid - a whole word id (\b is a word boundary)
  • \\s*=\\s* - a = enclosed with 0+ whitespaces
  • (\\w+) - Capturing group 1 (\\1 in the replacement pattern refers to this value): one or more letters, digits or underscores (or [^,]+ matches 1+ chars other than a comma)
  • .* - the rest of the string.

Other alternative solutions:

> xx <- "gee(formula = breaks ~ tension, id = wool, data = warpbreaks)"
> regmatches(xx, regexpr("\\bid\\s*=\\s*\\K[^,]+", xx, perl=TRUE))
[1] "wool"

The pattern matches id, = enclosed with 0+ whitespaces, then \K omits the matched text and only 1+ chars other than , land in the match value.

Or, a capturing approach with stringr::str_match is also valid here:

> library(stringr)
> str_match(xx, "\\bid\\s*=\\s*([^,]+)")[,2]
[1] "wool"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.