0

I have a requirement to identify a particular string in a text using below conditions:

*1. Any string containing whitespace before and after it OR

  1. Any string containing a . (dot) as prefix and whitespace as suffix OR

  2. Any string having whitespace as prefix and , (comma) as suffix*

Once found, i need to replace it with another string without replacing the prefix and suffix as mentioned above. And this needs to be done in a pl/sql code in Oracle (preferably using regexp_replace function).

Example:

Text : 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,'

Replace string: 'replaced'

Output: 'This, replaced replaced replaced, which_needs_.replaced replaced replaced, replaced matching.criteria.replaced replaced'

I know this is a weird example, but the actual requirement is even more weird than this. Please guide me how to achieve this.

Thank you in advance.

2
  • I find this difficult to handle with regexp_replace and would hence probably just loop through the characters instead. I suppose this is nothing you'd have problems with, correct? Your question is merely: "If I do this with regexp_replace, what must the regular expression look like?". Yes? Commented May 29, 2020 at 8:59
  • @ThorstenKettner Yes that's right. Commented Jun 3, 2020 at 10:55

1 Answer 1

2

Rather than trying to write one giant regex, it may be easier to split the string into rows for each token. You can do this by adapting any of the CSV-to-rows methods around. e.g.:

with rws as (
  select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
  select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
  from   rws
  connect by regexp_substr(str, '[^, .]+', 1, level) is not null
)
  select * from vals;

STR               L    
This,                  1 
is                     2 
a                      3 
sample_text,           4 
which_needs_.          5 
to_be_replaced         6 
as                     7 
per,                   8 
the                    9 
matching.             10 
criteria.             11 
defined               12 
above,                13

Now replace each of these according to your rules. You're only dealing with one token at a time, so it's easy to see which you're replacing correctly. This makes the regex easier to write and debug:

with rws as (
  select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
  select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
  from   rws
  connect by regexp_substr(str, '[^, .]+', 1, level) is not null
)
  select case 
           when l = 1 then str
           when substr ( str, -1, 1 ) = '.' then
             str
           else 
           regexp_replace (
             str,
             '^[A-z_]+',
             'replaced'
           )
       end replaced, l
  from   vals;

REPLACED        L    
This,                1 
replaced             2 
replaced             3 
replaced,            4 
which_needs_.        5 
replaced             6 
replaced             7 
replaced,            8 
replaced             9 
matching.           10 
criteria.           11 
replaced            12 
replaced,           13

Then you listagg the values back together to get the final string:

with rws as (
  select 'This, is a sample_text, which_needs_.to_be_replaced as per, the matching.criteria.defined above,' str from dual
), vals as (
  select regexp_substr(str,'[A-z_,]+(\.|\s)?', 1, level) str, level l
  from   rws
  connect by regexp_substr(str, '[^, .]+', 1, level) is not null
), replaces as (
  select case 
           when l = 1 then str
           when substr ( str, -1, 1 ) = '.' then
             str
           else 
           regexp_replace (
             str,
             '[A-z_]+',
             'replaced'
           )
       end replaced, l
  from   vals
)
  select listagg ( replaced ) 
           within group ( order by l ) s
  from   replaces;

S                                                                                                                          
This, replaced replaced replaced, which_needs_.replaced replaced replaced, replaced matching.criteria.replaced replaced,

Ensure you test thoroughly! In my experience, you find more exceptions/refinements when you have complex rules like this. So it's likely you'll have to tinker with the replacement rules in the case expression.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.