2

I want to remove duplicate strings from the Col B. For example :"New Cap Grp" is repeated five times in second record.

Col A   Col B
-----   -----
WDSA    ALT COMPANY, III & New Group
1101    New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp 
2255    Tata Associates Inc. & Tata Associates Inc.& Towers Watson 
3355    Picard Lorens, Inc. & Tata Associates Inc. & Tata Associates Inc. 
8877    Morphy Companies, Inc. & Morphy Companies, Inc. & Tele Pvt.Ltd

I am new to regular expressions so I am not able to figure out how exactly this can be achieved. If anyone knows how to handle such scenarios then please help me.

2
  • 1
    This goes without saying that you know the duplicated value in each record right? Commented Nov 27, 2015 at 15:13
  • Yes, I know the dupilcate values in each record. Commented Nov 29, 2015 at 9:34

1 Answer 1

1

I think it's impossible do using only regexp expresion because you must do update for Col B* value.

It's easier do on PL/SQL, I try do it:

Create table for test data

create table test
    (
        id   number,
        text varchar2(100)
    );

Insert test data

insert into test values (1, 'ALT COMPANY, III & New Group');
insert into test values (2, 'New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp & New Cap Grp');
insert into test values (3, 'Tata Associates Inc. & Tata Associates Inc.& Towers Watson');
insert into test values (4, 'Picard Lorens, Inc. & Tata Associates Inc. & Tata Associates Inc.');
insert into test values (5, 'Morphy Companies, Inc. & Morphy Companies, Inc. & Tele Pvt.Ltd');

PL/SQL block:

declare
    l_new_column_value varchar2(1024) := '';
begin
    -- go on all row
    for x in (select id, text from test)
    loop
        -- work with each row, do from one row several by separation symbol '&' and take distinct value
        for concat_text in (
            select distinct trim(regexp_substr(text, '[^&]+', 1, level)) as part_value
            from
                (
                    select text
                    from test
                    where id = x.id
                )
            connect by instr(text, '&', 1, level - 1) > 0)
        loop
            -- formiration new uniq value 
            l_new_column_value := l_new_column_value || concat_text.part_value || ' & ';
        end loop;
        -- undate raw data
        update test
            set text = substr(l_new_column_value, 0, length(l_new_column_value)-3)
        where id = x.id;
        l_new_column_value := '';
    end loop;
end;
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the reply but I was looking for a solution using only sql with regular expressions, but now it seems that it is not possible.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.