0

I am using REGEXP_SUBSTR to select part of a string. The full string is either of the form text (more_text (other_text)) or text (more_text (other_text)) (the difference here is the extra space before the beginning second bracket). The part of the string I want to collect in each case is 'more_text'.

I currently have the command

REGEXP_SUBSTR(string, '\((.*)\(', 1, 1, NULL, 1)

which works for the first format of string but doesn't return anything for the second. I'm unsure why the double space means it doesn't match. How can I change this to make it work in both cases?

EDIT: I realised that actually it isn't a double space it's a fullwidth parenthesis, which has the encoding %EF%BC%88. Is there a way of matching that character?

2
  • It returns more_text (with additional spaces) in the second case dbfiddle.uk/lESCgWKH Commented Sep 8, 2023 at 10:45
  • You need to provide the rule to identify the part you want to extract. Otherwise it's hard to make any formalisation. Commented Sep 8, 2023 at 10:50

2 Answers 2

0

Does it have to be regular expression? If not, how about substr + instr between 1st and 2nd open bracket?

SQL> with test (col) as
  2    (select 'text (more_text (other_text))' from dual union all
  3     select 'text (more_text  (other_text))' from dual
  4    )
  5  select col,
  6    trim(substr(col, instr(col, '(') + 1,
  7                     instr(col, '(', 1, 2) - instr(col, '(') - 1
  8               )
  9        ) result
 10  from test;

COL                            RESULT
------------------------------ ------------------------------
text (more_text (other_text))  more_text
text (more_text  (other_text)) more_text

SQL>
Sign up to request clarification or add additional context in comments.

5 Comments

I'm still getting the same problem in all cases as when I have a 'double space' it must be something else and I'm unclear how to work out what that is
Sorry, I don't understand what you are saying. Could you edit original question and post some more examples (i.e. source data) and result you expect out of that? It would probably be better if there were some more meaningful "words" that "more_text" and "other_text". Because, you got 3 answers so far and, apparently, we didn't manage to guess what you actually want. Therefore, perhaps it is more about you than us.
It turned out to be a character that wasn't a space which was why the other commands weren't working. It seems to be sorted now
Ah! Someone used invisible ink! OK, I'm glad you found the culprit. Thank you for the feedback.
I did, although haven't really found a good solution yet! It's a character which is encoded as %EF%BC%88 (fullwidth left parenthesis). The only way I've managed to solve it is to get it to match up to the last character before that, which could be pretty much anything, so it's not a nice solution, but it's a solution at least!
0

I believe you just need to make your group non-greedy and match one or spaces following before the next paren like this:

with tbl(id, str) as (
  select 1, 'text (more_text (other_text))' from dual union all
  select 2, 'text (more_text  (other_text))' from dual
)
select id, 
       regexp_substr(str, '\((.*?)\s+\(', 1, 1, null, 1) extract,
       length(regexp_substr(str, '\((.*?)\s+\(', 1, 1, null, 1)) length
from tbl
order by id;


        ID EXTRACT                            LENGTH
---------- ------------------------------ ----------
         1 more_text                               9
         2 more_text                               9

2 rows selected.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.