I have a data frame grouped by multiple columns but in this example it would be grouped only by Year.
Year Animal1 Animal2
0 2002 Dog Mouse,Lion
1 2002 Mouse
2 2002 Lion
3 2002 Duck
4 2010 Dog Cat
5 2010 Cat
6 2010 Lion
7 2010 Mouse
I would like for each group, from the rows where Animal2 is empty to filter out the rows where Animal2 does not appear in the column Animal1.
The expected output would be:
Year Animal1 Animal2
0 2002 Dog Mouse,Lion
1 2002 Mouse
2 2002 Lion
3 2010 Dog Cat
4 2010 Cat
Rows 0 & 3 stayed since Animal2 is not empty.
Rows 1 & 2 stayed since Mouse & Lion are in Animal2 for the first group.
Row 4 stayed since cat appear in Animal2 for the second group
EDIT: I get an error for a similar input data frame
Year Animal1 Animal2
0 2002 Dog Mouse
1 2002 Mouse
2 2002 Lion
3 2010 Dog
4 2010 Cat
The expected output would be:
Year Animal1 Animal2
0 2002 Dog Mouse
1 2002 Mouse
The error is triggered in the .apply(lambda g: g.isin(sets[g.name])) part of the code.
if not any(isinstance(k, slice) for k in key):
if len(key) == self.nlevels and self.is_unique:
# Complete key in unique index -> standard get_loc
try:
return (self._engine.get_loc(key), None)
except KeyError as err:
raise KeyError(key) from err
KeyError: (2010, 'Dog')