I am working in Oracle to fix some text. The issue is that sentences in my data have words where sentences aren't separated by spaces. For example:
Sentence without space.Between sentences
Sentence with question mark?Second sentence
I've tested the following replace statement in regex101 and it seems to work out there, but I can't pinpoint why it's not working in Oracle:
regexp_replace(review_text, '([^\s\.])([\.!\?]+)([^\s\.\d])', '\1\2 \3')
This should allow me to look for sentence-separating periods/exclamation points/question marks (single or grouped) and add the necessary space between sentences. I realize that there are other ways that sentences can be separated, but what I have above should cover a large majority of the use cases. The \d in the third capture group is to make sure that I'm not accidentally changing numeric values like "4.5" to "4. 5".
Before test group:
Sentence without space.Between sentences
Sentence with space. Between sentences
Sentence with multiple periods...Between sentences
False positive sentence with 4.5 Liters
Sentence with!Exclamation point
Sentence with!Question mark
After changes should look like this:
Sentence without space. Between sentences
Sentence with space. Between sentences
Sentence with multiple periods... Between sentences
False positive sentence with 4.5 Liters
Sentence with! Exclamation point
Sentence with! Question mark
Regex101 link: https://regex101.com/r/dC9zT8/1
While all changes work as expected from regex101, my issue is that I'm getting in Oracle is that my third and fourth test cases aren't working as intended. Oracle isn't adding a space after the multiple period (ellipses) group, and regexp_replace is adding a space for "4.5". I'm not sure why this is the case, but perhaps there's some peculiarity about Oracle regexp_replace that I'm missing.
Any and all insight is appreciated. Thanks!