0

I need help in solving what seems like a very easy problem. I have a string,70 - 3/31/2014 - [email protected]. I would like to parse out only the information between the second "-" and before the "@", i.e "60". Is there any formula or nested formula in R that can parse out string data between two specified characters?

Thanks!

2 Answers 2

3

1) sub This matches the entire string and then replaces it with the capture group, i.e. the portion matched to the part of the regular expression in parentheses:

x <- "70 - 3/31/2014 - [email protected]"
sub(".*- (.*)@.*", "\\1", x)
## [1] "60"

and here is a visualization of the regular expression used:

.*- (.*)@.*

Regular expression visualization

Debuggex Demo

2) gsub This replaces the portion before the wanted substring and the portion after the wanted substring with empty strings:

gsub(".*- |@.*", "", x)
# [1] "60"

whose regular expression can be visualized as:

.*- |@.*

Regular expression visualization

Debuggex Demo

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for your help on this! You and Avinash helped a bunch!
1

Through sub,

> x <- "70 - 3/31/2014 - [email protected]"
> sub("^[^-]*-[^-]*-\\s*([^@]*)@.*", "\\1", x)
[1] "60"
> sub("^[^-]*-[^-]*-([^@]*)@.*", "\\1", x)
[1] " 60"
> sub("^(?:[^-]*-){2}\\s*([^@]*)@.*", "\\1", x)
[1] "60"
  • ^ - Asserts that we are at the start.

  • [^-]*- Matches all the characters but not of -, zero or more times and the following hyphen.

  • (?:[^-]*-){2} - And the above pattern would be repeated exactly two times. So we end up with the second hyphen.

  • \\s* - Matches zero or more space characters.

  • ([^@]*) - Captures any character but not of @ zero or more times.

  • .* - Matches all the remaining characters.

So by replacing all the matched chars with the chars inside group index 1 will gave you the desired output.

OR

> x <- "70 - 3/31/2014 - [email protected]"
> m <- regexpr("^(?:[^-]*-){2}\\s*\\K[^@]*(?=@)", x, perl=TRUE)
> regmatches(x, m)
[1] "60"

\K keeps the text matched so far out of the overall regex match.

1 Comment

This works well. I find Grothendieck's answer easier to understand. But your bullet points were helpful in understanding and breaking apart the function. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.