0

I have a DolphinDB table with an array vector column. I need to remove duplicate rows based on subset relationships within that column.

Sample Input:

sym prices
a [3,4,5,6]
a [3,4,5]
a [2,4,5,6]
a [5,6]
a [7,9]
a [7,9]

Expected Output:

sym prices
a [3,4,5,6]
a [2,4,5,6]
a [7,9]

Deduplication Logic:

  1. Subset Removal: If a row's prices array is a subset (i.e., fully contained) of another row's prices array, remove the subset row. In the example, [3,4,5] is a subset of [3,4,5,6], so it is removed; similarly, [5,6] is also a subset of [3,4,5,6] and is removed.

  2. Full Duplicate Removal: If multiple rows have identical prices arrays, keep only one.

What I've Tried:

I considered using group by to remove exact duplicates, but this approach cannot handle subset relationships.

Core Question:
How can I perform this subset-based deduplication?

0

1 Answer 1

0

Disclaimer: I don't know DolphinDB.

You want to remove real subsets from the table. According to the docs (https://docs.dolphindb.com/en/Programming/Operators/OperatorReferences/lt.html) you can use the less-than operator for this:

delete from mytable subset
where exists
(
  select *
  from mytable superset
  where subset.prices < superset.prices
);

(If you only want to compare price vectors for the same sym, you must add and subset.sym = superset.sym to the subquery of course.)

You also want to remove duplicate sets and only keep one. For this you'll need an additional condition for equal sets (=), but then you'll also need some ID to tell one row from the other. In some DBMS there is a unique row ID built in. I don't know how it is in dolphin, so maybe you need a custom ID in your table. Then you can extend above statement as follows:

delete from mytable subset
where exists
(
  select *
  from mytable superset
  where subset.prices < superset.prices
  or (subset.prices = superset.prices and subset.id < superset.id)
);
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.