In reality my DF is huge with a lot more columns & more complex masks, but here's the principle I'm after:
DF A: (all birds)
name size location
1 bluebird small usa
2 cukoo medium germany
3 parrot large brazil
DF B: (new world birds)
name size location
1 bluebird small usa
2 parrot large brazil
I would like to split like this:
A
/ \
B C
df C should be A - B. Look in A, remove everything that's in B, and the result is C.
I wish this worked: C = A[~B] lolz it doesn't
df C should be the old world birds:
name size location
1 cukoo medium germany
There will be no duplicate rows.
And my data is really complex (for a Sankey diagram!)
So it's not practical to create df C by writing a filter like:
A.location != germany, belgium, egypt ... etc
idcolumn first? something likeA['id'] = range(len(A)).