1

I have an ETL task for datawarehouse-ing purposes, I need to extract the second part of a String after a delimiter occurence such as: '#', 'ý', '-'. For example test case string:

'Tori 1#MHK-MahallaKingaveKD' I should retrieve only 'MHK'

'HPHelm2ýFFS-Tredddline' I should retrieve only 'FFS'

I already tried using the cases above:

TRIM(CASE
WHEN INSTR('HPHelm2ýFFS-Tredddline', '#',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^#]+', 1,2), 
          '#'
       ))
    ELSE (CASE 
            WHEN INSTR('HPHelm2ýFFS-Tredddline', '-',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^-]+', 1,2), 
          '-'
       ))
       ELSE (CASE 
            WHEN INSTR('HPHelm2ýFFS-Tredddline','-') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','ý') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','#') = 0
    THEN 'HPHelm2ýFFS-Tredddline'
        ELSE (CASE
            WHEN INSTR('HPHelm2ýFFS-Tredddline','ý',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^ý]+', 1,2), 
          'ý'
       ))
            END)
          END)   
            END)
END)

Using the code above I can retrieve:

'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK-MahallaKingaveKD'
'HPHelm2ýFFS-Tredddline' ====> 'FFS-Tredddline'

Expected output:

'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK'
'HPHelm2ýFFS-Tredddline' ====> 'FFS'

So I have to exclude '-' and the string after.

I guess I should modify the regexp_substr pattern but can't seem to find a clear solution since '-' is specified in the case when statements as a delimiter.

0

2 Answers 2

1

I suggest retrieving the second occurrence of 1+ chars other than your delimiter chars:

regexp_substr(col, '[^#ý-]+', 1, 2)

Here, the search starts with the first char in the record (1), and the second occurrence is returned (2).

The [^#ý-]+ pattern matches one or more (+) chars other than #, ý and -.

Sign up to request clarification or add additional context in comments.

Comments

1

The following will give you what you're looking for:

WITH cteData AS (SELECT 'Tori 1#MHK-MahallaKingaveKD' AS STRING FROM DUAL UNION ALL
                 SELECT 'HPHelm2ýFFS-Tredddline' FROM DUAL)
SELECT STRING, REGEXP_SUBSTR(STRING, '[#ý-](.*)[#ý-]', 1, 1, NULL, 1) AS SUB_STRING
  FROM cteData;

The parentheses around the .* between the delimiter groups makes the .* a sub-expression, and the final ,1 in the parameter list tells REGEXP_SUBSTR to give you back the value of sub-expression #1. Since there's only one sub-expression in the regular expression it gives you back the value of the .*, which is what you're looking for.

sqlfiddle here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.