3

I'd like to remove values in list from column B based on column A, wondering how.

Given:

df = pd.DataFrame({
    'A': ['a1', 'a2', 'a3', 'a4'],
    'B': [['a1', 'a2'], ['a1', 'a2', 'a3'], ['a1', 'a3'], []]
})

I want:

result = pd.DataFrame({
    'A': ['a1', 'a2', 'a3', 'a4'],
    'B': [['a1', 'a2'], ['a1', 'a2', 'a3'], ['a1', 'a3'], []],
    'Output': [['a2'], ['a1', 'a3'], ['a1'], []]
})
3
  • How so? What is the logic of "based on column A" and what have you tried so far? Commented Jan 28, 2020 at 18:39
  • At the index 0 the output column in df2 shows that, a1 is removed from column B based on a1 in column A. Similarly for all rows, the corresponding value in column A is removed from column B. If column B is empty then it would be empty. Commented Jan 28, 2020 at 18:47
  • df.B.apply(lambda x: [i for i in x if i != list(df.A)]) Commented Jan 28, 2020 at 18:47

2 Answers 2

6

One way of doing that is applying a filtering function to each row via DataFrame.apply:

df['Output'] = df.apply(lambda x: [i for i in x.B if i != x.A], axis=1)
Sign up to request clarification or add additional context in comments.

1 Comment

With a comprehension instead of apply: df.assign(Output=[[z for z in x if z != y] for x, y in zip(df.B, df.A)])
2

Another solution using iterrows():

for i,value in df.iterrows():
    try:
        value['B'].remove(value['A'])
    except ValueError:
        pass
print(df)

Output:

    A         B
0  a1      [a2]
1  a2  [a1, a3]
2  a3      [a1]
3  a4        []

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.