4

I need your help with the regexp_replace function. I have a table which has a column for concatenated string values which contain duplicates. How do I eliminate them?

Example:

Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha

I need the output to be

Ian,Beatty,Larry,Neesha

The duplicates are random and not in any particular order.

Update--

Here's how my table looks

ID   Name1   Name2    Name3     
1     a       b         c
1     c       d         a
2     d       e         a
2     c       d          b

I need one row per ID having distinct name1,name2,name3 in one row as a comma separated string.

ID    Name
1     a,c,b,d,c
2     d,c,e,a,b

I have tried using listagg with distinct but I'm not able to remove the duplicates.

3
  • 1
    What a good reason to use a proper junction table -- or even a nested table -- rather than a comma delimited list. Good luck. Commented Feb 10, 2016 at 23:48
  • This looks to be a dupe of this Commented Feb 11, 2016 at 0:01
  • The pattern is different and doesn't work with my data. The dups still exist. Commented Feb 11, 2016 at 0:01

5 Answers 5

1

The easiest option I would go with -

SELECT ID, LISTAGG(NAME_LIST, ',')
  FROM (SELECT ID, NAME1 NAME_LIST FROM DATA UNION
        SELECT ID, NAME2 FROM DATA UNION
        SELECT ID, NAME3 FROM DATA
      )
GROUP BY ID;

Demo.

Sign up to request clarification or add additional context in comments.

Comments

0

So, try this out...

([^,]+),(?=.*[A-Za-z],[] ]*\1)

Comments

0

I don't think you can do it just with regexp_replace if the repeated values are not next to each other. One approach is to split the values up, eliminate the duplicates, and then put them back together.

The common method to tokenize a delimited string is with regexp_substr and a connect by clause. Using a bind variable with your string to make the code a bit clearer:

var value varchar2(100);
exec :value := 'Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha';

select regexp_substr(:value, '[^,]+', 1, level) as value
from dual
connect by regexp_substr(:value, '[^,]+', 1, level) is not null;

VALUE                        
------------------------------
Ian                           
Beatty                        
Larry                         
Neesha                        
Beatty                        
Neesha                        
Ian                           
Neesha                        

You can use that as a subquery (or CTE), get the distinct values from it, then reassemble it with listagg:

select listagg(value, ',') within group (order by value) as value
from (
  select distinct value from (
    select regexp_substr(:value, '[^,]+', 1, level) as value
    from dual
    connect by regexp_substr(:value, '[^,]+', 1, level) is not null
  )
);

VALUE                        
------------------------------
Beatty,Ian,Larry,Neesha       

It's a bit more complicated if you're looking at multiple rows in a table as that confused the connect-by syntax, but you can use a non-determinisitic reference to avoid loops:

with t42 (id, value) as (
  select 1, 'Ian,Beatty,Larry,Neesha,Beatty,Neesha,Ian,Neesha' from dual
  union all select 2, 'Mary,Joe,Mary,Frank,Joe' from dual
)
select id, listagg(value, ',') within group (order by value) as value
from (
  select distinct id, value from (
    select id, regexp_substr(value, '[^,]+', 1, level) as value
    from t42
    connect by regexp_substr(value, '[^,]+', 1, level) is not null
    and id = prior id
    and prior dbms_random.value is not null
  )
)
group by id;

        ID VALUE                        
---------- ------------------------------
         1 Beatty,Ian,Larry,Neesha       
         2 Frank,Joe,Mary                

Of course this wouldn't be necessary if you were storing relational data properly; having a delimited string in a column is not a good idea.

2 Comments

I will try this out and let you know...The data actually doesn't exist as a delimited string. Its from multiple rows per id and I have used listagg to concatenate them into 1 row per id
@Cindy - so why aren't you just getting the distinct values before calling listagg?
0

There is a way to find duplicates in this case, but it is a problem to remove them if there are more than one duplicated name within a string per id. Here is code that can deal with one duplicate per id.
Sample data:

WITH
    tbl AS
        (
            Select 1 "ID", 'a' "NAME_1", 'b' "NAME_2", 'c' "NAME_3" From Dual Union All
            Select 1 "ID", 'c' "NAME_1", 'd' "NAME_2", 'a' "NAME_3" From Dual Union All
            Select 2 "ID", 'd' "NAME_1", 'e' "NAME_2", 'a' "NAME_3" From Dual Union All
            Select 2 "ID", 'c' "NAME_1", 'd' "NAME_2", 'b' "NAME_3" From Dual 
        ),
    lists AS
        (
            Select 1 "ID", 'a,c,b,d,c' "NAME" From Dual Union All
            Select 2 "ID", 'd,c,e,a,b' "NAME" From Dual  
        ),

Creating CTE that compares your LISTAGG sttring with original data finding duplicate values:

  grid AS
    (
        Select DISTINCT l.ID, l.NAME,
            CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_1 || ',', '')) ) / Length(t.NAME_1 || ',') > 1 THEN NAME_1 END  "NAME_1",
            CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_2 || ',', '')) ) / Length(t.NAME_2 || ',') > 1 THEN NAME_2 END  "NAME_2",
            CASE WHEN ( Length(l.NAME || ',') - Length(Replace(l.NAME || ',', t.NAME_3 || ',', '')) ) / Length(t.NAME_3 || ',') > 1 THEN NAME_3 END  "NAME_3"
        From
            lists l
        Inner Join
            tbl t ON(t.ID = l.ID) 
    )

        ID NAME      NAME_1 NAME_2 NAME_3
---------- --------- ------ ------ ------
         2 d,c,e,a,b                      
         1 a,c,b,d,c c                    
         1 a,c,b,d,c               c     

Main SQL, using Union, builds new string (removing second appearance) where the duplicate was found and then puts that new string after comparison with the old one.

SELECT DISTINCT l.ID, Nvl(g.NAME, l.NAME) NAME
FROM
    lists l
LEFT JOIN
    (
        SELECT ID,  CASE  WHEN NAME_1 Is Not Null 
                          THEN  REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_1, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_1, 1, 2) + Length(NAME_1)), ',,', ','), NULL ) ) 
                    END "NAME"
        FROM grid
        WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
    UNION ALL
        SELECT ID,  CASE  WHEN NAME_2 Is Not Null 
                          THEN  REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_2, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_2, 1, 2) + Length(NAME_2)), ',,', ','), NULL ) ) 
                    END "NAME"
        FROM grid
        WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
    UNION ALL
        SELECT ID,  CASE  WHEN NAME_3 Is Not Null 
                          THEN  REPLACE(NAME, NAME, COALESCE( REPLACE( SubStr(NAME, 1, InStr(NAME, NAME_3, 1, 2) - 1) || SubStr(NAME, InStr(NAME, NAME_3, 1, 2) + Length(NAME_3)), ',,', ','), NULL ) ) 
                    END "NAME"
        FROM grid
        WHERE COALESCE(NAME_1, NAME_2, NAME_3) IS NOT NULL
    ) g ON(g.ID = l.ID And Length(g.NAME) < Length(l.NAME))

R e s u l t :
        ID NAME         
---------- -------------
         2 d,c,e,a,b    
         1 a,c,b,d     

For multiple occurences within a string or for multiplicated different names there should be done some recursions or multiplied nestings to get it done...

Comments

0

Use this function, it worked for me.

DECLARE  
input_string varchar2(255);
merged_users VARCHAR2(4000);
merged_list VARCHAR2(4000);

BEGIN
input_string:='abc3,abc1,abc2,abc3,abc2,abc4';

 -- Remove leading and trailing commas from input_string
input_string := TRIM(',' FROM input_string);

 -- Split the input_string into individual elements
 WITH data AS (
 SELECT TRIM(REGEXP_SUBSTR(input_string, '[^,]+', 1, LEVEL)) AS token
 FROM dual
 CONNECT BY LEVEL <= REGEXP_COUNT(input_string, '[^,]+')
 ),

 -- Select distinct tokens and concatenate them
distinct_data AS (
SELECT DISTINCT token
FROM data
)
SELECT LISTAGG(token, ',') WITHIN GROUP (ORDER BY 1) INTO merged_users
FROM distinct_data;

DBMS_OUTPUT.PUT_LINE(merged_users);

END;
/

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.