1

In Postgres 9.5.1 the following command works:

select regexp_replace('JamesBond007','\d+','');

Output:

JamesBond

However the asterisc does not seem to work:

select regexp_replace('JamesBond007','\d*','');

it produces:

JamesBond007

Even more weird things happen when I put something in as replacement string:

select regexp_replace('JamesBond007','\d+','008');

results in:

JamesBond008

while

select regexp_replace('JamesBond007','\d*','008');

gives me back:

008JamesBond007

The Postgres documentation says * = a sequence of 0 or more matches of the atom. So what is happening here? (N.B. in Oracle all the above works as expected)

1 Answer 1

5

The thing is that \d* can match an empty string and you are not passing the flag g.

See regexp_replace:

The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Flag i specifies case-insensitive matching, while flag g specifies replacement of each matching substring rather than only the first one.

The \d* matches the empty location at the beginning of the JamesBond007 string, and since g is not passed, that empty string is replaced with 008 when you use select regexp_replace('JamesBond007','\d*','008'); and the result is expected - 008JamesBond007.

With select regexp_replace('JamesBond007','\d*','');, again, \d* matches the empty location at the beginning of the string, and replaces it with an empty string (no visible changes).

Note that Oracle's REGEXP_REPLACE replaces all occurrences by default:

By default, the function returns source_char with every occurrence of the regular expression pattern replaced with replace_string.

In general, you should be cautious when using patterns matching empty strings inside regex-based replace functions/methods. Do it only when you understand what you are doing. If you want to replace digit(s) you usually want to find at least 1 digit. Else, why remove something that is not present in the string in the first place?

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for the quick answer, Wiktor. I see how it works now. However, I still have problems to understand why Postgres needs to use these flags at all. You could specify case insensitivity within the regex itself. Also, the g flag feature could be expressed by greediness vs non-greediness of the quantifier. And why \d finds an empty location at the beginning of a string, if it should match only digits?
@absurd: Why the default behavior is different between Oracle and Postgres, I have no idea, no one asked me when implementing regexp_replace :). As for the \d* matching empty location at the beginning, it is totally valid since * means match zero or more occurrences. There are 0 digits at the beginning as the string starts with J. That is why it is matched.
More, flags you need to pass to the regex replace (and other regex methods) differ across languages. In .NET, Regex.Replace will replace all occurrences by default, in Ruby/R you need a specific function for each type of replacement (sub - once, gsub - many), in JS, you need to pass g flag with the regex. I think Postgres was decided to be closer to JS in some way.
Ah, now I see. Thanks again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.