Remove Duplicate words/strings from Oracle table using Regular Expressions

Question

I want to remove duplicate strings from the Col B. For example :"New Cap Grp" is repeated five times in second record.

Col A   Col B
-----   -----
WDSA    ALT COMPANY, III & New Group
1101    New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp 
2255    Tata Associates Inc. & Tata Associates Inc.& Towers Watson 
3355    Picard Lorens, Inc. & Tata Associates Inc. & Tata Associates Inc. 
8877    Morphy Companies, Inc. & Morphy Companies, Inc. & Tele Pvt.Ltd

I am new to regular expressions so I am not able to figure out how exactly this can be achieved. If anyone knows how to handle such scenarios then please help me.

This goes without saying that you know the duplicated value in each record right? — Amen Jlili
– Amen Jlili, Commented Nov 27, 2015 at 15:13

APC · Accepted Answer · 2017-12-21 18:17:44Z

1

I think it's impossible do using only regexp expresion because you must do update for Col B* value.

It's easier do on PL/SQL, I try do it:

Create table for test data

create table test
    (
        id   number,
        text varchar2(100)
    );

Insert test data

insert into test values (1, 'ALT COMPANY, III & New Group');
insert into test values (2, 'New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp');
insert into test values (3, 'Tata Associates Inc. & Tata Associates Inc.& Towers Watson');
insert into test values (4, 'Picard Lorens, Inc. & Tata Associates Inc. & Tata Associates Inc.');
insert into test values (5, 'Morphy Companies, Inc. & Morphy Companies, Inc. & Tele Pvt.Ltd');

PL/SQL block:

declare
    l_new_column_value varchar2(1024) := '';
begin
    -- go on all row
    for x in (select id, text from test)
    loop
        -- work with each row, do from one row several by separation symbol '&' and take distinct value
        for concat_text in (
            select distinct trim(regexp_substr(text, '[^&]+', 1, level)) as part_value
            from
                (
                    select text
                    from test
                    where id = x.id
                )
            connect by instr(text, '&', 1, level - 1) > 0)
        loop
            -- formiration new uniq value 
            l_new_column_value := l_new_column_value || concat_text.part_value || ' & ';
        end loop;
        -- undate raw data
        update test
            set text = substr(l_new_column_value, 0, length(l_new_column_value)-3)
        where id = x.id;
        l_new_column_value := '';
    end loop;
end;

edited Dec 21, 2017 at 18:17

APC

147k20 gold badges180 silver badges289 bronze badges

answered Nov 27, 2015 at 16:05

HAYMbl4

1,4802 gold badges15 silver badges29 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user3395359 Over a year ago

Thanks for the reply but I was looking for a solution using only sql with regular expressions, but now it seems that it is not possible.

Collectives™ on Stack Overflow

Remove Duplicate words/strings from Oracle table using Regular Expressions

1 Answer 1

Create table for test data

Insert test data

PL/SQL block:

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Create table for test data

Insert test data

PL/SQL block:

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related