1

I am trying to split the expression like in Postgres 9.4: "some text 123_good_345 and other text 123_some_invalid and 222_work ok_333 stop."

using pattern: (\d+\_.*\_\d+\D)+? result is:

"123_good_345"
"123_some_invalid and 222_work ok_333"

But I need

"123_good_345"
"222_work ok_333"

note, ignoring "123_some_invalid"

Please help!

3
  • On what basis do you decide that 222_work and ok_333 are part of same match group? Commented Dec 12, 2018 at 16:51
  • expression must start with "number + underscore" and end with "underscore + number", but "number + underscore some text number + underscore ... underscore + number" is wrong Commented Dec 12, 2018 at 17:08
  • You need \d+_(?:(?!\d_).)*_\d+. See regex101.com/r/iq45qc/1 and rextester.com/VJJXF1223. See full answer below. Commented Dec 12, 2018 at 17:09

1 Answer 1

1

You may use

\d+_(?:(?!\d_).)*_\d+

See the regex demo. Or, if there can be no digits between \d+_ and _\d+, use

\d+_\D+_\d+

See this regex demo.

Details

  • \d+ - 1 or more digits -_ - an underscore
  • (?:(?!\d_).)* - any char, 0 or more repetitions, as many as possible, that does not start a digit + _ char sequence
  • \D+ - any 1+ chars other than digits
  • _ - an underscore
  • \d+ - 1+ digits.

See the PostgreSQL demo:

SELECT unnest(regexp_matches('some text 123_good_345 and other text 123_some_invalid and 222_work ok_333 stop.', '\d+_(?:(?!\d_).)*_\d+', 'g'));

or

SELECT unnest(regexp_matches('some text 123_good_345 and other text 123_some_invalid and 222_work ok_333 stop.', '\d+_\D+_\d+', 'g'));

enter image description here

Sign up to request clarification or add additional context in comments.

16 Comments

Almost, but it is ok for Postgres 9.6, while in Postgres 9.4 it returns: "123_good_345 and other text 123" for the first line, while second line is correct :-/
@man2002ua Strange, try \d+_(?:(?!\d_).)*?_\d+. Or \d+_[^_]*_\d+.
first suggestion - no changes, second - only last expression shown (but correct)
@man2002ua What is the code behind that? It is quite clear to me there is something that modifies your input, or you just have an input different from what you posted.
Does '\d+_[^\d]+_\d+' for you?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.