Rather than trying to write one giant regex, it may be easier to split the string into rows for each token. You can do this by adapting any of the CSV-to-rows methods around. e.g.:
with rws as (
select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
from rws
connect by regexp_substr(str, '[^, .]+', 1, level) is not null
)
select * from vals;
STR L
This, 1
is 2
a 3
sample_text, 4
which_needs_. 5
to_be_replaced 6
as 7
per, 8
the 9
matching. 10
criteria. 11
defined 12
above, 13
Now replace each of these according to your rules. You're only dealing with one token at a time, so it's easy to see which you're replacing correctly. This makes the regex easier to write and debug:
with rws as (
select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
from rws
connect by regexp_substr(str, '[^, .]+', 1, level) is not null
)
select case
when l = 1 then str
when substr ( str, -1, 1 ) = '.' then
str
else
regexp_replace (
str,
'^[A-z_]+',
'replaced'
)
end replaced, l
from vals;
REPLACED L
This, 1
replaced 2
replaced 3
replaced, 4
which_needs_. 5
replaced 6
replaced 7
replaced, 8
replaced 9
matching. 10
criteria. 11
replaced 12
replaced, 13
Then you listagg the values back together to get the final string:
with rws as (
select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
from rws
connect by regexp_substr(str, '[^, .]+', 1, level) is not null
), replaces as (
select case
when l = 1 then str
when substr ( str, -1, 1 ) = '.' then
str
else
regexp_replace (
str,
'[A-z_]+',
'replaced'
)
end replaced, l
from vals
)
select listagg ( replaced )
within group ( order by l ) s
from replaces;
S
This, replaced replaced replaced, which_needs_.replaced replaced replaced, replaced matching.criteria.replaced replaced,
Ensure you test thoroughly! In my experience, you find more exceptions/refinements when you have complex rules like this. So it's likely you'll have to tinker with the replacement rules in the case expression.
regexp_replaceand would hence probably just loop through the characters instead. I suppose this is nothing you'd have problems with, correct? Your question is merely: "If I do this withregexp_replace, what must the regular expression look like?". Yes?