0

I have incoming data which is delimiter by a string token:

Data 1__##__Data Two__##__Third Data__##__4th Data__##__

I would like to split this by the delimiter __##__ to give:

Data 1
Data Two
Third Data
4th Data

I figured on looping and using this to get each data item in turn, but as a first stage to everything from the beginning to the nth item inclusive:

SELECT REGEXP_REPLACE( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', '__##__.+', 1, 1 ) from dual;

The above gives the desired Data 1, however:

SELECT REGEXP_REPLACE( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', '__##__.+', 1, 2 ) from dual;

expected to give Data1__##__Data Two instead gives the whole input string back unchanged.

Earlier I tried the following but this just returns blank:

SELECT REGEXP_SUBSTR( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
                      '(?!__##__)',
                       1, LEVEL )
FROM dual
CONNECT BY REGEXP_SUBSTR( 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
                      '(?!__##__)',
                       1, LEVEL ) IS NOT NULL

This returns null.

1
  • You might think of using SQL Loader , in which delimiter options for both columns and lines already exist , presumably the data taken from a file. Commented Nov 30, 2023 at 14:24

2 Answers 2

1

You do not need (slow) regular expressions and can use a recursive query and simple (faster) string functions:

WITH bounds (value, spos, epos) AS (
  SELECT value,
         1,
         INSTR(value, '__##__', 1)
  FROM   table_name
UNION ALL
  SELECT value,
         epos + 6,
         INSTR(value, '__##__', epos + 6)
  FROM   bounds
  WHERE  epos > 0
) SEARCH DEPTH FIRST BY value SET order_id
SELECT SUBSTR(value, spos, epos - spos) AS value 
FROM   bounds
WHERE  epos > 0

Which, for the sample data:

CREATE TABLE table_name (value) AS
SELECT 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__' FROM DUAL;

Outputs:

VALUE
Data 1
Data Two
Third Data
4th Data

Your query does not work because Oracle does not support zero-width look-arounds (neither ahead nor behind, positive nor negative) in regular expressions. Although, even if Oracle did support it I'm not sure that it would give the answer you expect.

If you really want to use regular expressions, you can fix your query to:

SELECT REGEXP_SUBSTR(
         'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
         '(.*?)__##__',
         1,
         LEVEL,
         NULL,
         1
       ) AS value
FROM   dual
CONNECT BY REGEXP_COUNT(
         'Data 1__##__Data Two__##__Third Data__##__4th Data__##__', 
         '(.*?)__##__'
       ) >= LEVEL;

Which outputs the same as the previous query (although slower) for a single row of input (but would start to generate exponentially increasing numbers of duplicate rows if you provide multiple rows of input).

fiddle

Sign up to request clarification or add additional context in comments.

Comments

1

For the sake of argument, this method uses regexp_substr with connect by and it handles multiple rows of input data:

with tbl(id, str) as (
  select 1, 'Data 1__##__Data Two__##__Third Data__##__4th Data__##__' 
  from dual union all
  select 2, 'Data 5__##__Data six__##__Seventh Data__##__8th Data__##__' 
  from dual union all
  select 3, 'Data 9__##__Data Ten__##__Eleventh Data__##__12th Data__##__' 
  from dual
)
select id, level as element,
  regexp_substr(str, '(.*?)(__##__)', 1, level, null, 1) data
from tbl
connect by level <= regexp_count(str, '__##__')
and prior id = id
and prior sys_guid() is not null;

        ID    ELEMENT DATA                
---------- ---------- --------------------
         1          1 Data 1              
         1          2 Data Two            
         1          3 Third Data          
         1          4 4th Data            
         2          1 Data 5              
         2          2 Data six            
         2          3 Seventh Data        
         2          4 8th Data            
         3          1 Data 9              
         3          2 Data Ten            
         3          3 Eleventh Data       
         3          4 12th Data           

12 rows selected.

1 Comment

For a discussion on the relative performance of different string splitting methods, see this answer. The summary is that using SYS_GUID() measures about 4 times slower than the other methods tested.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.