3

I have an issue of regex_substr not honoring the null value.

select
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 1)    AS phn_nbr,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 2)    AS phn_pos,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 3)    AS phn_typ,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 4)    AS phn_strt_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5)    AS phn_end_dt,
REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 6)    AS pub_indctr
from dual;

If the phn_end_dt is null and pub_indctr is not null, the values of pub_indctr are shifted to phn_end_dt.

Result:-

PHN_NBR    PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR  
---------- ------- ------- ----------- ---------- ------------
2035197553 2       S       14-JUN-14   P 

While it should be

PHN_NBR    PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR  
---------- ------- ------- ----------- ---------- ------------
2035197553 2       S       14-JUN-14               P 

Any suggestions ?

6 Answers 6

4

I'm afraid your accepted answer does not handle the case where you need the value after the null position (try to get the 6th field):

SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 6) phn_end
_dt
  2  from dual;

P
-

You need to do this instead I believe (works on 11g):

SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '([^,]*)(,|$)', 1, 6,
NULL, 1) phn_end_dt
  2  from dual;

P
-
P

I just discovered this after posting my own question: REGEX to select nth value from a list, allowing for nulls

Sign up to request clarification or add additional context in comments.

Comments

2

You can solve your task like this:

with t(val) as (
  select '2035197553,2,S,14-JUN-14,,P' from dual
), t1 (val) as (
  select ',' || val || ',' from t
)
select substr(val, REGEXP_INSTR(val, ',', 1, 1) + 1, REGEXP_INSTR(val, ',', 1, 1 + 1) - REGEXP_INSTR(val, ',', 1, 1) - 1) a
     , substr(val, REGEXP_INSTR(val, ',', 1, 2) + 1, REGEXP_INSTR(val, ',', 1, 2 + 1) - REGEXP_INSTR(val, ',', 1, 2) - 1) b
     , substr(val, REGEXP_INSTR(val, ',', 1, 3) + 1, REGEXP_INSTR(val, ',', 1, 3 + 1) - REGEXP_INSTR(val, ',', 1, 3) - 1) c
     , substr(val, REGEXP_INSTR(val, ',', 1, 4) + 1, REGEXP_INSTR(val, ',', 1, 4 + 1) - REGEXP_INSTR(val, ',', 1, 4) - 1) d
     , substr(val, REGEXP_INSTR(val, ',', 1, 5) + 1, REGEXP_INSTR(val, ',', 1, 5 + 1) - REGEXP_INSTR(val, ',', 1, 5) - 1) e
     , substr(val, REGEXP_INSTR(val, ',', 1, 6) + 1, REGEXP_INSTR(val, ',', 1, 6 + 1) - REGEXP_INSTR(val, ',', 1, 6) - 1) f
  from t1

     A      B   C       D       E   F
-------------------------------------
2035197553  2   S   14-JUN-14   -   P

2 Comments

Your solution works well whether you use regexp or not. Of course without the regexp, the performance will be better.
Thank you, I didn't test it without the REGEXP
2

The typical csv parsing approach is as follows:

WITH t(csv_str) AS
  ( SELECT '2035197553,2,S,14-JUN-14,,P' FROM dual
  UNION ALL
  SELECT '2035197553,2,S,14-JUN-14,,' FROM dual
  )
SELECT LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 1), ',') AS phn_nbr,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 2), ',') AS phn_pos,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 3), ',') AS phn_typ,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 4), ',') AS phn_strt_dt,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 5), ',') AS phn_end_dt,
  LTRIM(REGEXP_SUBSTR (','
  || csv_str, ',[^,]*', 1, 6), ',') AS pub_indctr
FROM t

I like to place a comma preceeding my csv and then I would count the commas with the non-comma pattern.

Explanation of the search pattern

The search pattern looks for the nth substring (nth corresponds with the nth element in the csv) which has the following:

-The pattern begins with a ','

-Next, it is followed by the pattern, '[^,]'. This is just a non-matching list expression. The caret, ^, conveys that the characters following in the list should not be matched.

-This non-matching list of characters has the quantifier, *, which means this can occur 0 or more times.

~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once a match is found, I would also use the LTRIM function to remove the comma after I used the reg expression.

What is nice about this approach is the occurrence of the search pattern will always correspond with the occurences of the comma.

Comments

1

You need to change this line,

REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5)    AS phn_end_dt,

to,

REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 5)    AS phn_end_dt,
                                                   ^

[^,]+ means it matches any character not of , one or more times. [^,]* means it matches any character not of , zero or more times. So [^,]+ assumes that there must be a single character not of , would present. But really there isn't , by changing + to * makes the regex engine to match a empty character.

3 Comments

Thanks for pointing me in the right direction, I have used this to solve the issue. SELECT REGEXP_SUBSTR (val, '([^,]*),|$', 1, 1, NULL, 1) phn_nbr ,REGEXP_SUBSTR (val, '([^,]*),|$', 1, 2, NULL, 1) phn_pos ,REGEXP_SUBSTR (val, '([^,]*),|$', 1, 3, NULL, 1) phn_typ ,REGEXP_SUBSTR (val, '([^,]*),|$', 1, 4, NULL, 1) phn_strt_dt ,REGEXP_SUBSTR (val, '([^,]*),|$', 1, 5, NULL, 1) phn_end_dt ,REGEXP_SUBSTR (val || ',', '([^,]*),|$', 1, 6, NULL, 1) pub_indctr FROM (SELECT '2035197553,2,S,14-JUN-14,,P' val FROM dual);
@Ankit: Could you unmark this as an answer, and post your solution as an answer. Since Avinash Raj's answer is wrong.
@Avinash Raj: On what Oracle version did you test this? I'm getting one NULL value for each comma in the string, so the P-values end up in the 10th capture group. I get the S-value in the 5th capture group. Using Oracle 12c
1

Thanks for pointing me in the right direction, I have used this to solve the issue.

SELECT REGEXP_SUBSTR (val, '([^,]*),|$', 1, 1, NULL, 1) phn_nbr , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 2, NULL, 1) phn_pos , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 3, NULL, 1) phn_typ , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 4, NULL, 1) phn_strt_dt , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 5, NULL, 1) phn_end_dt , REGEXP_SUBSTR (val || ',', '([^,]*),|$', 1, 6, NULL, 1) pub_indctr FROM (SELECT '2035197553,2,S,14-JUN-14,,P' val FROM dual );

Oracle Version:- Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

Comments

1

I have a generic use case where I don't know the exact columns coming in the string. I thus used below code which solved the purpose.

function substring_specific_occurence(p_string varchar2
                                    ,p_delimiter varchar2
                                    ,p_occurence number) return varchar2
is 
    l_output varchar2(2000);
    g_miss_char     varchar2(20) := 'fdkjkjhkuhhf7';
    l_string varchar2(10000) := replace(p_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'' );

begin 

    while  (l_string like '%'||p_delimiter||p_delimiter||'%' )
    loop 
        l_string := replace(l_string,p_delimiter||p_delimiter,''||p_delimiter||g_miss_char||p_delimiter||'');
    end loop;

    select regexp_substr(l_string,'[^'||p_delimiter||']+',1,p_occurence) 
    into l_output
    from dual;

    return replace(l_output,g_miss_char);

end substring_specific_occurence;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.