My source and target is like this
Source DataFrame
key col1 col2 col3 col4 col5 col6
1 AA BB CC null null null
2 SS null null null null null
3 AA CC RR SS DD null
Target DataFrame
Key Column
1 AA
1 BB
1 CC
2 SS
3 AA
....
I want to compare these 2 values to check if they are populating properly and there is no duplication. I have tried several ways but all are very slow,
One way I tried is:
- Read column "key" in a list,
- Then iterate over the source and get all the col values in array for that key,
- Remove nulls from the array then sort the array.
- From target similar operation to store all the values in array for the key and then sort the array and compare the array with:
sourceArray.sameElements(targetArray)
Is there any easy solution to this. I think I am over-complicating this simple problem.