How to remove duplicates from comma separated list by regexp_replace in Oracle?

Question

I have

 POW,POW,POWPRO,PRO,PRO,PROUTL,TNEUTL,TNEUTL,UTL,UTLTNE,UTL,UTLTNE

I want

POW,POWPRO,PRO,PROUTL,TNEUTL,UTL,UTLTNE

I tried

select regexp_replace('POW,POW,POWPRO,PRO,PRO,PROUTL,TNEUTL,TNEUTL,UTL,UTLTNE,UTL,UTLTNE','([^,]+)(,\1)+','\1') from dual

And I get the output

 POWPROUTL,TNEUTL,UTLTNE,UTLTNE

But i want the output to be

POW,POWPRO,PRO,PROUTL,TNEUTL,UTL,UTLTNE

Please help.

Possible duplicate of the OP's previous question: Distinct of CSV values using REGEXP_REPLACE in oracle — MT0
– MT0, Commented Jun 23, 2016 at 21:40
This would match all duplicates (?<=,|^)([^,]+),(?=(?:[^,]+,)*\1(?:,|$)) but Oracle does not support lookahead/behind in regular expressions. — MT0
– MT0, Commented Jun 23, 2016 at 22:17

MT0 · Accepted Answer · 2016-06-24 01:11:52Z

Two solutions that use only SQL and a third solution that uses a small/simple PL/SQL function which makes for a very short final SQL query.

Oracle Setup:

CREATE TABLE data ( value ) AS
SELECT 'POW,POW,POWPRO,PRO,PRO,PROUTL,TNEUTL,TNEUTL,UTL,UTLTNE,UTL,UTLTNE' FROM DUAL;

CREATE TYPE stringlist AS TABLE OF VARCHAR2(4000);
/

Query 1:

SELECT LISTAGG( t.COLUMN_VALUE, ',' ) WITHIN GROUP ( ORDER BY t.COLUMN_VALUE ) AS list
FROM   data d,
       TABLE(
         SET(
           CAST(
             MULTISET(
              SELECT REGEXP_SUBSTR( d.value, '[^,]+', 1, LEVEL )
              FROM   DUAL
              CONNECT BY LEVEL <= REGEXP_COUNT( d.value, '[^,]+' )
             ) AS stringlist
           )
         )
       ) t
GROUP BY d.value;

Outputs:

LIST
---------------------------------------
POW,POWPRO,PRO,PROUTL,TNEUTL,UTL,UTLTNE

Query 2:

SELECT ( SELECT LISTAGG(  COLUMN_VALUE, ',' ) WITHIN GROUP ( ORDER BY ROWNUM )
         FROM TABLE( d.uniques ) ) AS list
FROM   (
  SELECT ( SELECT CAST(
                    COLLECT(
                      DISTINCT
                      REGEXP_SUBSTR( d.value, '[^,]+', 1, LEVEL )
                    )
                    AS stringlist
                  )
            FROM  DUAL
            CONNECT BY LEVEL <= REGEXP_COUNT( d.value, '[^,]+' )
         ) uniques
  FROM   data d
) d;

Output:

LIST
---------------------------------------
POW,POWPRO,PRO,PROUTL,TNEUTL,UTL,UTLTNE

Oracle Setup:

A small helper function:

CREATE FUNCTION split_String(
  i_str    IN  VARCHAR2,
  i_delim  IN  VARCHAR2 DEFAULT ','
) RETURN stringlist DETERMINISTIC
AS
  p_result       stringlist := stringlist();
  p_start        NUMBER(5) := 1;
  p_end          NUMBER(5);
  c_len CONSTANT NUMBER(5) := LENGTH( i_str );
  c_ld  CONSTANT NUMBER(5) := LENGTH( i_delim );
BEGIN
  IF c_len > 0 THEN
    p_end := INSTR( i_str, i_delim, p_start );
    WHILE p_end > 0 LOOP
      p_result.EXTEND;
      p_result( p_result.COUNT ) := SUBSTR( i_str, p_start, p_end - p_start );
      p_start := p_end + c_ld;
      p_end := INSTR( i_str, i_delim, p_start );
    END LOOP;
    IF p_start <= c_len + 1 THEN
      p_result.EXTEND;
      p_result( p_result.COUNT ) := SUBSTR( i_str, p_start, c_len - p_start + 1 );
    END IF;
  END IF;
  RETURN p_result;
END;
/

Query 3:

SELECT ( SELECT LISTAGG(  COLUMN_VALUE, ',' ) WITHIN GROUP ( ORDER BY ROWNUM )
         FROM TABLE( SET( split_String( d.value ) ) ) ) AS list
FROM   data d;

or (if you only want to pass a single value):

SELECT LISTAGG(  COLUMN_VALUE, ',' ) WITHIN GROUP ( ORDER BY ROWNUM ) AS list
FROM   TABLE( SET( split_String(
          'POW,POW,POWPRO,PRO,PRO,PROUTL,TNEUTL,TNEUTL,UTL,UTLTNE,UTL,UTLTNE'
       ) ) );

Output:

LIST
---------------------------------------
POW,POWPRO,PRO,PROUTL,TNEUTL,UTL,UTLTNE

If you are able to create a table and insert the values then anybody can achieve the requirement, however I am thinking you should suppose to answer without creating a table.
@prashantthakre See the edit #4 if you want a version without creating tables - the existence (or non-existence) of the table makes practically no difference to the solution.

user5683823 · Accepted Answer · 2016-06-23 22:43:59Z

The solution offered below uses straight SQL (no PL/SQL). It works with any possible input string, and it removes duplicates in place - it keeps the order of input tokens, whatever that order is. It also removes consecutive commas (it "deletes nulls" from the input string) while treating null inputs correctly. Notice the output for an input string consisting of commas only, and the correct treatment of "tokens" consisting of two spaces and one space respectively.

The query runs relatively slowly; if performance is an issue, it can be re-written as a recursive query, using "traditional" substr and instr which are quite a bit faster than regular expressions.

with inputs (input_string) as (
       select 'POW,POW,POWPRO,PRO,PRO,PROUTL,TNEUTL,TNEUTL,UTL,UTLTNE,UTL,UTLTNE' from dual
       union all
       select null from dual
       union all
       select 'ab,ab,st,ab,st,  , ,  ,x,,,r' from dual
       union all
       select ',,,' from dual
     ),
     tokens (input_string, rk, token) as (
       select     input_string, level, 
                  regexp_substr(input_string, '([^,]+)', 1, level, null, 1)
       from       inputs 
       connect by level <= 1 + regexp_count(input_string, ',')
     ),
     distinct_tokens (input_string, rk, token) as (
       select     input_string, min(rk) as rk, token
       from       tokens
       group by   input_string, token
     )
select   input_string, listagg(token, ',') within group (order by rk) output_string
from     distinct_tokens
group by input_string
;

Results for the inputs I created:

INPUT_STRING                                                       OUTPUT_STRING
------------------------------------------------------------------ ----------------------------------------
,,,                                                                (null)
POW,POW,POWPRO,PRO,PRO,PROUTL,TNEUTL,TNEUTL,UTL,UTLTNE,UTL,UTLTNE  POW,POWPRO,PRO,PROUTL,TNEUTL,UTL,UTLTNE
ab,ab,st,ab,st,  , ,  ,x,,,r                                       ab,st,  , ,x,r
(null)                                                             (null)


4 rows selected.

Kul Bhushan Prasad · Accepted Answer · 2023-08-26 01:31:04Z

This function is working fine for me.

DECLARE  
input_string varchar2(255);
merged_users VARCHAR2(4000);
merged_list VARCHAR2(4000);

BEGIN
input_string:='abc3,abc1,abc2,abc3,abc2,abc4';

  -- Remove leading and trailing commas from input_string
input_string := TRIM(',' FROM input_string);

 -- Split the input_string into individual elements
 WITH data AS (
 SELECT TRIM(REGEXP_SUBSTR(input_string, '[^,]+', 1, LEVEL)) AS token
 FROM dual
 CONNECT BY LEVEL <= REGEXP_COUNT(input_string, '[^,]+')
  ),

 -- Select distinct tokens and concatenate them
distinct_data AS (
SELECT DISTINCT token
FROM data
)
SELECT LISTAGG(token, ',') WITHIN GROUP (ORDER BY 1) INTO merged_users
FROM distinct_data;

 DBMS_OUTPUT.PUT_LINE(merged_users);

 END;
 /

Collectives™ on Stack Overflow

How to remove duplicates from comma separated list by regexp_replace in Oracle?

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related