Remove duplicate values from comma separated string in Oracle

Question

I need your help with the regexp_replace function. I have a table which has a column for concatenated string values which contain duplicates. How do I eliminate them?

Example:

Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha

I need the output to be

Ian,Beatty,Larry,Neesha

The duplicates are random and not in any particular order.

Update--

Here's how my table looks

ID   Name1   Name2    Name3     
1     a       b         c
1     c       d         a
2     d       e         a
2     c       d          b

I need one row per ID having distinct name1,name2,name3 in one row as a comma separated string.

ID    Name
1     a,c,b,d,c
2     d,c,e,a,b

I have tried using listagg with distinct but I'm not able to remove the duplicates.

What a good reason to use a proper junction table -- or even a nested table -- rather than a comma delimited list. Good luck. — Gordon Linoff
– Gordon Linoff, Commented Feb 10, 2016 at 23:48
The pattern is different and doesn't work with my data. The dups still exist. — Cindy
– Cindy, Commented Feb 11, 2016 at 0:01

Ankit Bajpai · Accepted Answer · 2023-01-22 21:29:59Z

1

The easiest option I would go with -

SELECT ID, LISTAGG(NAME_LIST, ',')
  FROM (SELECT ID, NAME1 NAME_LIST FROM DATA UNION
        SELECT ID, NAME2 FROM DATA UNION
        SELECT ID, NAME3 FROM DATA
      )
GROUP BY ID;

Demo.

answered Jan 22, 2023 at 21:29

Ankit Bajpai

13.6k4 gold badges27 silver badges43 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Dave · Accepted Answer · 2016-02-11 00:24:21Z

0

So, try this out...

([^,]+),(?=.*[A-Za-z],[] ]*\1)

answered Feb 11, 2016 at 0:24

Dave

1611 silver badge8 bronze badges

Comments

Alex Poole · Accepted Answer · 2016-02-11 10:54:08Z

I don't think you can do it just with regexp_replace if the repeated values are not next to each other. One approach is to split the values up, eliminate the duplicates, and then put them back together.

The common method to tokenize a delimited string is with regexp_substr and a connect by clause. Using a bind variable with your string to make the code a bit clearer:

var value varchar2(100);
exec :value := 'Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha';

select regexp_substr(:value, '[^,]+', 1, level) as value
from dual
connect by regexp_substr(:value, '[^,]+', 1, level) is not null;

VALUE                        
------------------------------
Ian                           
Beatty                        
Larry                         
Neesha                        
Beatty                        
Neesha                        
Ian                           
Neesha

You can use that as a subquery (or CTE), get the distinct values from it, then reassemble it with listagg:

select listagg(value, ',') within group (order by value) as value
from (
  select distinct value from (
    select regexp_substr(:value, '[^,]+', 1, level) as value
    from dual
    connect by regexp_substr(:value, '[^,]+', 1, level) is not null
  )
);

VALUE                        
------------------------------
Beatty,Ian,Larry,Neesha

It's a bit more complicated if you're looking at multiple rows in a table as that confused the connect-by syntax, but you can use a non-determinisitic reference to avoid loops:

with t42 (id, value) as (
  select 1, 'Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha' from dual
  union all select 2, 'Mary,Joe,Mary,Frank,Joe' from dual
)
select id, listagg(value, ',') within group (order by value) as value
from (
  select distinct id, value from (
    select id, regexp_substr(value, '[^,]+', 1, level) as value
    from t42
    connect by regexp_substr(value, '[^,]+', 1, level) is not null
    and id = prior id
    and prior dbms_random.value is not null
  )
)
group by id;

        ID VALUE                        
---------- ------------------------------
         1 Beatty,Ian,Larry,Neesha       
         2 Frank,Joe,Mary

Of course this wouldn't be necessary if you were storing relational data properly; having a delimited string in a column is not a good idea.

I will try this out and let you know...The data actually doesn't exist as a delimited string. Its from multiple rows per id and I have used listagg to concatenate them into 1 row per id
@Cindy - so why aren't you just getting the distinct values before calling listagg?

d r · Accepted Answer · 2023-01-22 19:58:23Z

There is a way to find duplicates in this case, but it is a problem to remove them if there are more than one duplicated name within a string per id. Here is code that can deal with one duplicate per id.
Sample data:

WITH
    tbl AS
        (
            Select 1 "ID", 'a' "NAME_1", 'b' "NAME_2", 'c' "NAME_3" From Dual Union All
            Select 1 "ID", 'c' "NAME_1", 'd' "NAME_2", 'a' "NAME_3" From Dual Union All
            Select 2 "ID", 'd' "NAME_1", 'e' "NAME_2", 'a' "NAME_3" From Dual Union All
            Select 2 "ID", 'c' "NAME_1", 'd' "NAME_2", 'b' "NAME_3" From Dual 
        ),
    lists AS
        (
            Select 1 "ID", 'a,c,b,d,c' "NAME" From Dual Union All
            Select 2 "ID", 'd,c,e,a,b' "NAME" From Dual  
        ),

Creating CTE that compares your LISTAGG sttring with original data finding duplicate values:

  grid AS
    (
        Select DISTINCT l.ID, l.NAME,
            CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_1 || ',', '')) ) / Length(t.NAME_1 || ',') > 1 THEN NAME_1 END  "NAME_1",
            CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_2 || ',', '')) ) / Length(t.NAME_2 || ',') > 1 THEN NAME_2 END  "NAME_2",
            CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_3 || ',', '')) ) / Length(t.NAME_3 || ',') > 1 THEN NAME_3 END  "NAME_3"
        From
            lists l
        Inner Join
            tbl t ON(t.ID = l.ID) 
    )

        ID NAME      NAME_1 NAME_2 NAME_3
---------- --------- ------ ------ ------
         2 d,c,e,a,b                      
         1 a,c,b,d,c c                    
         1 a,c,b,d,c               c

Main SQL, using Union, builds new string (removing second appearance) where the duplicate was found and then puts that new string after comparison with the old one.

SELECT DISTINCT l.ID, Nvl(g.NAME, l.NAME) NAME
FROM
    lists l
LEFT JOIN
    (
        SELECT ID,  CASE  WHEN NAME_1 Is Not Null 
                          THEN  REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_1, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_1, 1, 2) + Length(NAME_1)), ',,', ','), NULL ) ) 
                    END "NAME"
        FROM grid
        WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
    UNION ALL
        SELECT ID,  CASE  WHEN NAME_2 Is Not Null 
                          THEN  REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_2, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_2, 1, 2) + Length(NAME_2)), ',,', ','), NULL ) ) 
                    END "NAME"
        FROM grid
        WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
    UNION ALL
        SELECT ID,  CASE  WHEN NAME_3 Is Not Null 
                          THEN  REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_3, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_3, 1, 2) + Length(NAME_3)), ',,', ','), NULL ) ) 
                    END "NAME"
        FROM grid
        WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
    ) g ON(g.ID = l.ID And Length(g.NAME) < Length(l.NAME))

R e s u l t :
        ID NAME         
---------- -------------
         2 d,c,e,a,b    
         1 a,c,b,d

For multiple occurences within a string or for multiplicated different names there should be done some recursions or multiplied nestings to get it done...

Kul Bhushan Prasad · Accepted Answer · 2023-08-26 01:26:21Z

Use this function, it worked for me.

DECLARE  
input_string varchar2(255);
merged_users VARCHAR2(4000);
merged_list VARCHAR2(4000);

BEGIN
input_string:='abc3,abc1,abc2,abc3,abc2,abc4';

 -- Remove leading and trailing commas from input_string
input_string := TRIM(',' FROM input_string);

 -- Split the input_string into individual elements
 WITH data AS (
 SELECT TRIM(REGEXP_SUBSTR(input_string, '[^,]+', 1, LEVEL)) AS token
 FROM dual
 CONNECT BY LEVEL <= REGEXP_COUNT(input_string, '[^,]+')
 ),

 -- Select distinct tokens and concatenate them
distinct_data AS (
SELECT DISTINCT token
FROM data
)
SELECT LISTAGG(token, ',') WITHIN GROUP (ORDER BY 1) INTO merged_users
FROM distinct_data;

DBMS_OUTPUT.PUT_LINE(merged_users);

END;
/

Collectives™ on Stack Overflow

Remove duplicate values from comma separated string in Oracle

5 Answers 5

Comments

Comments

2 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related