Oracle REGEX_SUBSTR Not Honoring null values

Question

I have an issue of regex_substr not honoring the null value.

select
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 1)    AS phn_nbr,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 2)    AS phn_pos,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 3)    AS phn_typ,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 4)    AS phn_strt_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5)    AS phn_end_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 6)    AS pub_indctr
from dual;

If the phn_end_dt is null and pub_indctr is not null, the values of pub_indctr are shifted to phn_end_dt.

Result:-

PHN_NBR    PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR  
---------- ------- ------- ----------- ---------- ------------
2035197553 2       S       14-JUN-14   P

While it should be

PHN_NBR    PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR  
---------- ------- ------- ----------- ---------- ------------
2035197553 2       S       14-JUN-14               P

Any suggestions ?

Community · Accepted Answer · 2017-05-23 11:58:00Z

4

I'm afraid your accepted answer does not handle the case where you need the value after the null position (try to get the 6th field):

SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 6) phn_end
_dt
  2  from dual;

P
-

You need to do this instead I believe (works on 11g):

SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '([^,]*)(,|$)', 1, 6,
NULL, 1) phn_end_dt
  2  from dual;

P
-
P

I just discovered this after posting my own question: REGEX to select nth value from a list, allowing for nulls

edited May 23, 2017 at 11:58

CommunityBot

11 silver badge

answered Sep 3, 2014 at 19:28

Gary_W

10.4k1 gold badge26 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

neshkeev · Accepted Answer · 2014-08-27 14:29:32Z

2

You can solve your task like this:

with t(val) as (
  select '2035197553,2,S,14-JUN-14,,P' from dual
), t1 (val) as (
  select ',' || val || ',' from t
)
select substr(val, REGEXP_INSTR(val, ',', 1, 1) + 1, REGEXP_INSTR(val, ',', 1, 1 + 1) - REGEXP_INSTR(val, ',', 1, 1) - 1) a
     , substr(val, REGEXP_INSTR(val, ',', 1, 2) + 1, REGEXP_INSTR(val, ',', 1, 2 + 1) - REGEXP_INSTR(val, ',', 1, 2) - 1) b
     , substr(val, REGEXP_INSTR(val, ',', 1, 3) + 1, REGEXP_INSTR(val, ',', 1, 3 + 1) - REGEXP_INSTR(val, ',', 1, 3) - 1) c
     , substr(val, REGEXP_INSTR(val, ',', 1, 4) + 1, REGEXP_INSTR(val, ',', 1, 4 + 1) - REGEXP_INSTR(val, ',', 1, 4) - 1) d
     , substr(val, REGEXP_INSTR(val, ',', 1, 5) + 1, REGEXP_INSTR(val, ',', 1, 5 + 1) - REGEXP_INSTR(val, ',', 1, 5) - 1) e
     , substr(val, REGEXP_INSTR(val, ',', 1, 6) + 1, REGEXP_INSTR(val, ',', 1, 6 + 1) - REGEXP_INSTR(val, ',', 1, 6) - 1) f
  from t1

     A      B   C       D       E   F
-------------------------------------
2035197553  2   S   14-JUN-14   -   P

answered Aug 27, 2014 at 14:29

neshkeev

6,4863 gold badges32 silver badges49 bronze badges

2 Comments

Patrick Bacon Over a year ago

Your solution works well whether you use regexp or not. Of course without the regexp, the performance will be better.

neshkeev Over a year ago

Thank you, I didn't test it without the REGEXP

Patrick Bacon · Accepted Answer · 2014-09-03 17:11:13Z

The typical csv parsing approach is as follows:

WITH t(csv_str) AS
  ( SELECT '2035197553,2,S,14-JUN-14,,P' FROM dual
  UNION ALL
  SELECT '2035197553,2,S,14-JUN-14,,' FROM dual
  )
SELECT LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 1), ',') AS phn_nbr,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 2), ',') AS phn_pos,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 3), ',') AS phn_typ,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 4), ',') AS phn_strt_dt,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 5), ',') AS phn_end_dt,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 6), ',') AS pub_indctr
FROM t

I like to place a comma preceeding my csv and then I would count the commas with the non-comma pattern.

Explanation of the search pattern

The search pattern looks for the nth substring (nth corresponds with the nth element in the csv) which has the following:

-The pattern begins with a ','

-Next, it is followed by the pattern, '[^,]'. This is just a non-matching list expression. The caret, ^, conveys that the characters following in the list should not be matched.

-This non-matching list of characters has the quantifier, *, which means this can occur 0 or more times.

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once a match is found, I would also use the LTRIM function to remove the comma after I used the reg expression.

What is nice about this approach is the occurrence of the search pattern will always correspond with the occurences of the comma.

Avinash Raj · Accepted Answer · 2014-08-27 14:22:29Z

1

You need to change this line,

REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5)    AS phn_end_dt,

to,

REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 5)    AS phn_end_dt,
                                                   ^

[^,]+ means it matches any character not of , one or more times. [^,]* means it matches any character not of , zero or more times. So [^,]+ assumes that there must be a single character not of , would present. But really there isn't , by changing + to * makes the regex engine to match a empty character.

edited Aug 27, 2014 at 14:22

answered Aug 27, 2014 at 14:15

Avinash Raj

175k32 gold badges247 silver badges289 bronze badges

3 Comments

Ankit Over a year ago

Thanks for pointing me in the right direction, I have used this to solve the issue. SELECT REGEXP_SUBSTR (val, '([^,]*),|$', 1, 1, NULL, 1) phn_nbr ,REGEXP_SUBSTR (val, '([^,]*),|$', 1, 2, NULL, 1) phn_pos ,REGEXP_SUBSTR (val, '([^,]*),|$', 1, 3, NULL, 1) phn_typ ,REGEXP_SUBSTR (val, '([^,]*),|$', 1, 4, NULL, 1) phn_strt_dt ,REGEXP_SUBSTR (val, '([^,]*),|$', 1, 5, NULL, 1) phn_end_dt ,REGEXP_SUBSTR (val || ',', '([^,]*),|$', 1, 6, NULL, 1) pub_indctr FROM (SELECT '2035197553,2,S,14-JUN-14,,P' val FROM dual);

Wouter Over a year ago

@Ankit: Could you unmark this as an answer, and post your solution as an answer. Since Avinash Raj's answer is wrong.

Wouter Over a year ago

@Avinash Raj: On what Oracle version did you test this? I'm getting one NULL value for each comma in the string, so the P-values end up in the 10th capture group. I get the S-value in the 5th capture group. Using Oracle 12c

Ankit · Accepted Answer · 2015-08-14 13:14:45Z

1

Thanks for pointing me in the right direction, I have used this to solve the issue.

SELECT REGEXP_SUBSTR (val, '([^,]*),|$', 1, 1, NULL, 1) phn_nbr , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 2, NULL, 1) phn_pos , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 3, NULL, 1) phn_typ , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 4, NULL, 1) phn_strt_dt , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 5, NULL, 1) phn_end_dt , REGEXP_SUBSTR (val || ',', '([^,]*),|$', 1, 6, NULL, 1) pub_indctr FROM (SELECT '2035197553,2,S,14-JUN-14,,P' val FROM dual );

Oracle Version:- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

answered Aug 14, 2015 at 13:14

Ankit

1,2501 gold badge16 silver badges23 bronze badges

Comments

Rajinder Nagpal · Accepted Answer · 2023-01-05 15:08:39Z

I have a generic use case where I don't know the exact columns coming in the string. I thus used below code which solved the purpose.

function substring_specific_occurence(p_string varchar2
                                    ,p_delimiter varchar2
                                    ,p_occurence number) return varchar2
is 
    l_output varchar2(2000);
    g_miss_char     varchar2(20) := 'fdkjkjhkuhhf7';
    l_string varchar2(10000) := replace(p_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'' );

begin 

    while  (l_string like '%'||p_delimiter||p_delimiter||'%' )
    loop 
        l_string := replace(l_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'');
    end loop;

    select regexp_substr(l_string,'[^'||p_delimiter||']+',1,p_occurence) 
    into l_output
    from dual;

    return replace(l_output,g_miss_char);

end substring_specific_occurence;

Collectives™ on Stack Overflow

Oracle REGEX_SUBSTR Not Honoring null values

6 Answers 6

Comments

2 Comments

Comments

3 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

Comments

2 Comments

Comments

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related