1

I have a sql column nvarchar(200) which has the following values

    xxx {"Name":"Stack"} yyy
    aaa {"Name":"Overflow"} bbb
    ccc {"Name":Stack"} ddd
    eee {"Name":"Overflow"} fff

I want to remove the first duplicate row where the Name is the same i.e in the example above I would want to remove the 3 and 4 row because it contains a duplicate name but I want to keep the first row and the second row.

How can I achieve this?

1
  • 1
    Is this data exactly valid, do you expect to match "Name":"Stack" and "Name":Stack" with the typo as being the same? Commented Jun 16, 2013 at 21:55

1 Answer 1

2

Assuming the part you are interested in is delimited by "{" and "}" and that you have an id to establish the ordering (that is define what is first), then you can do this with a relatively direct query.

The innermost subquery finds the "name" definition. The next level assigns a sequential number to each on using row_number() and the outermost selects the first one:

select t.*
from (select t.*,
             row_number() over (partition by NamePortion order by id) as seqnum
      from (select t.*,
                   substring(t.col,
                             charindex('{', t.col),
                             charindex('}', t.col) - charindex('{', t.col)
                            ) as NamePortion
            from t
           ) t
     ) t
where seqnum = 1
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks Gordon. I forgot to mention the tricky part and that is the ending } does not guarantee that it just contains "Name":"x". The value can be {xxx,yyy}{"Name":"x", "age":10} and the second row can be like {x}{"Name":"x"}

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.