columnA columnB columnC
a 0 a
c 1 c|f
b 2 a|b|c
For such a dataframe, I want to change the columnC to:
columnA columnB columnC
a 0
c 1 f
b 2 a|c
for each element in columnC, I want to check whether it exists in the corresponding column A; if it exists, then drop it from colulmnC.
I want to write a function like:
df['columnC'] = df[['columnA', 'columnC']].apply(remove_duplicate)
def remove_duplicate(columnA, columnC):
c_values = set(columnC.split('|'))
if columnA in c_values.copy:
c_values.remove(columnA)
new_C = '|'.join(c_values)
return c_values
But this complains:
TypeError: remove_duplicate() missing 1 required positional argument: 'columnC'