0

I want to extract numbers and only numbers from a string.
Say I have a string like this: "VW Golf 2009". I can use the regex [0-9]+ to extract the 2009 part.

The problem arises when I have a string like this: "BMW 2013 i8". I want to extract the 2013 part, but not the 8 part.

Basically, I want to extract the "year" part of any string similar to the following:

BMW 2013 i8
VW Golf 2009
1938 CarCompany, inc. <insert car name here>
My 128th birthday is in the year 2014.
aui895h 2013 5qnui 89hth658h uab2 52h5h528h
etc.

3 Answers 3

1

What about using the \b (boundary) metacharacter (depending on your regex implemenation), like so?

\b\d+\b

Or if you want a specific number of digits:

\b\d{4}\b
Sign up to request clarification or add additional context in comments.

Comments

1

I believe \d{4} will solve this nicely.

If you want to ensure that only a 4 digit standalone year word is matched, \W\d{4}\W will also work.

If you further just want to ensure that "sensible" dates (4 digits and beginning in 19, 20) you can do (19|20)\d{2}.

Comments

1
(?<=^|\s)[0-9]+?(?=\s|$|\.(?=\s|$)|[;,\"'!?])

will work.
One advantage of this regex is that it can easily be modified.

Explanation:

  • (?<=^|\s) is a Positive Lookbehind.
    • (?<= begins the positive lookbehind.
    • ^|\s matches either of the following:
    • ) ends the positive lookbehind.
  • [0-9]+? is the heart of this regex.
    • [0-9] matches a single character that is any digit (0123456789):
    • +? is a Possessive Quantifier that repeats [0-9] one or more times.
  • (?=\s|$|\.(?=\s|$)|[;,\"'!?]) is a Positive Lookahead.
    • (?= begins the positive lookahead.
    • \s|$|\.(?=\s|$)|[;,\"'!?] matches any of the following:
      • \s any whitespace or newline character.
      • $ an end-of-string anchor.
      • \.(?=\D) the character ., if that character is immediately followed by
        • \D any any non-digit character.
      • [;,\"'!?] any of these characters: ;, ,, ", ', !, ?.
    • ) ends the positive lookahead.

You can also find another good explanation here: http://regex101.com/r/pC6yA9

To implement this in java, you can use this code:

Matcher yearMatcher = Pattern.compile("(?<=^|\s)[0-9]+?(?=\s|$|[.,;](?=\s|$)).matcher("BMW 2013 i8");
yearMatcher.find();
year = yearMatcher.group();

making sure to import java.util.regex.*

4 Comments

Try regex101.com for those explanations.. See: regex101.com/r/wN0tG8 (PCRE and Java are fairly similar)
I did, but the regex didn't work there: regex101.com/r/nH2eI7 EDIT: lol, didn't see your edit.
This is an extraordinarily complex solution to a very simple problem. Regexes are already very developer unfriendly (pretend like the code you're writing will be maintained by an axe murderer who knows where you live). I'd recommend using the simplest possible regex that gets the job done.
@RyanCarlson: Thats because you selected Javasript as your flavor.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.