Postgres asterisc regex quantifier not working

Question

In Postgres 9.5.1 the following command works:

select regexp_replace('JamesBond007','\d+','');

Output:

JamesBond

However the asterisc does not seem to work:

select regexp_replace('JamesBond007','\d*','');

it produces:

JamesBond007

Even more weird things happen when I put something in as replacement string:

select regexp_replace('JamesBond007','\d+','008');

results in:

JamesBond008

while

select regexp_replace('JamesBond007','\d*','008');

gives me back:

008JamesBond007

The Postgres documentation says * = a sequence of 0 or more matches of the atom. So what is happening here? (N.B. in Oracle all the above works as expected)

Wiktor Stribiżew · Accepted Answer · 2016-04-08 10:26:28Z

5

The thing is that \d* can match an empty string and you are not passing the flag g.

See regexp_replace:

The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Flag i specifies case-insensitive matching, while flag g specifies replacement of each matching substring rather than only the first one.

The \d* matches the empty location at the beginning of the JamesBond007 string, and since g is not passed, that empty string is replaced with 008 when you use select regexp_replace('JamesBond007','\d*','008'); and the result is expected - 008JamesBond007.

With select regexp_replace('JamesBond007','\d*','');, again, \d* matches the empty location at the beginning of the string, and replaces it with an empty string (no visible changes).

Note that Oracle's REGEXP_REPLACE replaces all occurrences by default:

By default, the function returns source_char with every occurrence of the regular expression pattern replaced with replace_string.

In general, you should be cautious when using patterns matching empty strings inside regex-based replace functions/methods. Do it only when you understand what you are doing. If you want to replace digit(s) you usually want to find at least 1 digit. Else, why remove something that is not present in the string in the first place?

edited Apr 8, 2016 at 10:26

answered Apr 8, 2016 at 10:19

Wiktor Stribiżew

631k41 gold badges502 silver badges633 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

absurd Over a year ago

Thanks for the quick answer, Wiktor. I see how it works now. However, I still have problems to understand why Postgres needs to use these flags at all. You could specify case insensitivity within the regex itself. Also, the g flag feature could be expressed by greediness vs non-greediness of the quantifier. And why \d finds an empty location at the beginning of a string, if it should match only digits?

Wiktor Stribiżew Over a year ago

@absurd: Why the default behavior is different between Oracle and Postgres, I have no idea, no one asked me when implementing regexp_replace :). As for the \d* matching empty location at the beginning, it is totally valid since * means match zero or more occurrences. There are 0 digits at the beginning as the string starts with J. That is why it is matched.

Wiktor Stribiżew Over a year ago

More, flags you need to pass to the regex replace (and other regex methods) differ across languages. In .NET, Regex.Replace will replace all occurrences by default, in Ruby/R you need a specific function for each type of replacement (sub - once, gsub - many), in JS, you need to pass g flag with the regex. I think Postgres was decided to be closer to JS in some way.

absurd Over a year ago

Ah, now I see. Thanks again.

Collectives™ on Stack Overflow

Postgres asterisc regex quantifier not working

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related