1

I want to extract 2 parts of certain string. For this I want to use regular expressions. For Example the String is: Big_Dog_0044_0080 and my Regex is: Big_Dog_([0-9]+)_?([0-9A-Z]*) Group 1 would be then 0044 and the second would be 0080.

Now my problem is to bring this into an select statement: I tried it with regex_substr.

This was my result:

 select 
'Big_Dog_0044_0080' as TestString,
regexp_substr('Big_Dog_0044_0080', '([0-9]+)') Group1 , 
regexp_substr('Big_Dog_0044_0080', '([0-9A-Z]*)') Group2 from dual;

My Output:

TESTSTRING         Group1 Group2
Big_Dog_0044_0080   0044    B

First of all Group 2 is wrong i would expect 0080. What I am doing wrong here?

And second question, any other way to solve this? I have like 5-6 different regexp I want to use. For example next one would be ([A-C])_CatWeezle

I mean like select the input string column and go through the possible regexp. If one regexp matches, then stop and extract values.

Thank you!

2 Answers 2

4

What I am doing wrong here?

The regular expression '([0-9A-Z]*)' will match the first occurrence of zero-or-more digit or upper-case alphabetic characters in the string. For your string 'Big_Dog_0044_0080' the first character is the upper-case alphabetic character B so that will get matched and the second character i does not match your regular expression so it is not included in the group.

You state that your string matches the pattern:

Big_Dog_([0-9]+)_?([0-9A-Z]*)

Then you can anchor the pattern to the end of the string using $ and can use:

WITH test_data (value) AS (
  SELECT 'Big_Dog_0044_0080' FROM DUAL UNION ALL
  SELECT 'Big_Dog_00440080' FROM DUAL UNION ALL
  SELECT 'Big_Dog_00440A80' FROM DUAL
)
SELECT value,
       REGEXP_SUBSTR(value, '([0-9]+)_?([0-9A-Z]*)$', 1, 1, NULL, 1) AS Group1 , 
       REGEXP_SUBSTR(value, '([0-9]+)_?([0-9A-Z]*)$', 1, 1, NULL, 2) AS Group2
FROM   test_data;

Which outputs:

VALUE GROUP1 GROUP2
Big_Dog_0044_0080 0044 0080
Big_Dog_00440080 00440080
Big_Dog_00440A80 00440 A80

(Note: the second row is matched entirely by the first group and the second group has zero-width and the third row will match the first group until it finds a non-digit character and then start the second group.)

If the delimiting underscore is optional then you may want to use fixed-width matches (assuming the substrings are each 4-characters):

WITH test_data (value) AS (
  SELECT 'Big_Dog_0044_0080' FROM DUAL UNION ALL
  SELECT 'Big_Dog_00440080' FROM DUAL UNION ALL
  SELECT 'Big_Dog_00440A80' FROM DUAL
)
SELECT value,
       REGEXP_SUBSTR(value, '([0-9]{4})_?([0-9A-Z]{4})$', 1, 1, NULL, 1) AS Group1 , 
       REGEXP_SUBSTR(value, '([0-9]{4})_?([0-9A-Z]{4})$', 1, 1, NULL, 2) AS Group2
FROM   test_data;

Which outputs:

VALUE GROUP1 GROUP2
Big_Dog_0044_0080 0044 0080
Big_Dog_00440080 0044 0080
Big_Dog_00440A80 0044 0A80

I mean like select the input string column and go through the possible regexp. If one regexp matches, then stop and extract values.

Use a CASE expression:

WITH test_data (value) AS (
  SELECT 'Big_Dog_0044_0080' FROM DUAL UNION ALL
  SELECT 'Big_Dog_00440080' FROM DUAL UNION ALL
  SELECT 'Big_Dog_00440A80' FROM DUAL UNION ALL
  SELECT 'A_CatWeezle' FROM DUAL
)
SELECT value,
       CASE
       WHEN REGEXP_LIKE(value, '^Big_Dog_(\d{4})_?([0-9A-Z]{4})$')
       THEN REGEXP_SUBSTR(value, '^Big_Dog_(\d{4})_?([0-9A-Z]{4})$', 1, 1, NULL, 1)
       WHEN REGEXP_LIKE(value, '^([A-C])_CatWeezle$')
       THEN REGEXP_SUBSTR(value, '^([A-C])_CatWeezle$', 1, 1, NULL, 1)
       END AS group1,
       CASE
       WHEN REGEXP_LIKE(value, '^Big_Dog_(\d{4})_?([0-9A-Z]{4})$')
       THEN REGEXP_SUBSTR(value, '^Big_Dog_(\d{4})_?([0-9A-Z]{4})$', 1, 1, NULL, 2)
       END AS group2
FROM   test_data;

Outputs:

VALUE GROUP1 GROUP2
Big_Dog_0044_0080 0044 0080
Big_Dog_00440080 0044 0080
Big_Dog_00440A80 0044 0A80
A_CatWeezle A

db<>fiddle here

Sign up to request clarification or add additional context in comments.

1 Comment

thank you so much! That was exactly what I am looking for.
0

The first part of the question is easy:

SQL> SELECT 'Big_Dog_0044_0080' AS TestString,
  2         REGEXP_SUBSTR ('Big_Dog_0044_0080', '\d+', 1, 1) Group1,
  3         REGEXP_SUBSTR ('Big_Dog_0044_0080', '\d+', 1, 2) Group2
  4    FROM DUAL;

TESTSTRING        GROU GROU
----------------- ---- ----
Big_Dog_0044_0080 0044 0080

SQL>

I don't quite understand what you meant to say by the CatWeezle thing.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.