Manipulating with regexp_substr

Question

I have an ETL task for datawarehouse-ing purposes, I need to extract the second part of a String after a delimiter occurence such as: '#', 'ý', '-'. For example test case string:

'Tori 1#MHK-MahallaKingaveKD' I should retrieve only 'MHK'

'HPHelm2ýFFS-Tredddline' I should retrieve only 'FFS'

I already tried using the cases above:

TRIM(CASE
WHEN INSTR('HPHelm2ýFFS-Tredddline', '#',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^#]+', 1,2), 
          '#'
       ))
    ELSE (CASE 
            WHEN INSTR('HPHelm2ýFFS-Tredddline', '-',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^-]+', 1,2), 
          '-'
       ))
       ELSE (CASE 
            WHEN INSTR('HPHelm2ýFFS-Tredddline','-') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','ý') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','#') = 0
    THEN 'HPHelm2ýFFS-Tredddline'
        ELSE (CASE
            WHEN INSTR('HPHelm2ýFFS-Tredddline','ý',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^ý]+', 1,2), 
          'ý'
       ))
            END)
          END)   
            END)
END)

Using the code above I can retrieve:

'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK-MahallaKingaveKD'
'HPHelm2ýFFS-Tredddline' ====> 'FFS-Tredddline'

Expected output:

'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK'
'HPHelm2ýFFS-Tredddline' ====> 'FFS'

So I have to exclude '-' and the string after.

I guess I should modify the regexp_substr pattern but can't seem to find a clear solution since '-' is specified in the case when statements as a delimiter.

Wiktor Stribiżew · Accepted Answer · 2019-08-07 11:30:03Z

1

I suggest retrieving the second occurrence of 1+ chars other than your delimiter chars:

regexp_substr(col, '[^#ý-]+', 1, 2)

Here, the search starts with the first char in the record (1), and the second occurrence is returned (2).

The [^#ý-]+ pattern matches one or more (+) chars other than #, ý and -.

answered Aug 7, 2019 at 11:30

Wiktor Stribiżew

631k41 gold badges502 silver badges632 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Bob Jarvis - Слава Україні · Accepted Answer · 2019-08-07 11:37:55Z

1

The following will give you what you're looking for:

WITH cteData AS (SELECT 'Tori 1#MHK-MahallaKingaveKD' AS STRING FROM DUAL UNION ALL
                 SELECT 'HPHelm2ýFFS-Tredddline' FROM DUAL)
SELECT STRING, REGEXP_SUBSTR(STRING, '[#ý-](.*)[#ý-]', 1, 1, NULL, 1) AS SUB_STRING
  FROM cteData;

The parentheses around the .* between the delimiter groups makes the .* a sub-expression, and the final ,1 in the parameter list tells REGEXP_SUBSTR to give you back the value of sub-expression #1. Since there's only one sub-expression in the regular expression it gives you back the value of the .*, which is what you're looking for.

sqlfiddle here

edited Aug 7, 2019 at 11:37

answered Aug 7, 2019 at 11:31

Bob Jarvis - Слава Україні

50.2k10 gold badges81 silver badges119 bronze badges

Collectives™ on Stack Overflow

Manipulating with regexp_substr

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related