Task
I would like to custom aggregate my DataFrame
import numpy as np
df = pd.DataFrame({'a': [1,1,1,2,2], 'b': [[(1,2,3),(4,5),(6,)],[(7,8),(9,10)],np.NaN,[(11,12),(13,)],np.NaN], 'c': [1,2,3,4,5]})
a b c
0 1 [(1, 2, 3), (4, 5), (6,)] 1
1 1 [(7, 8), (9, 10)] 2
2 1 NaN 3
3 2 [(11, 12), (13,)] 4
4 2 NaN 5
such that the lists in column b are extending each other per group. The result shall be
pd.DataFrame({'a': [1,2], 'b': [[(1,2,3),(4,5),(6,),(7,8),(9,10)],[(11,12),(13,)]], 'c': [6,9]})
a b c
0 1 [(1, 2, 3), (4, 5), (6,), (7, 8), (9, 10)] 6
1 2 [(11, 12), (13,)] 9
Attempted Solution
I was going with
def mylistaggregator(l):
return [item for sublist in l.tolist() for item in sublist]
df. \
groupby('a', sort=False). \
agg({'b': mylistaggregator,
'c': 'sum'})
but get
TypeError: 'float' object is not iterable
and are not sure what the solution would be. I also tinkered around with lambda, but did not get anywhere.
Additional information
Running
types = []
for i in df.b:
types.append(str(type(i)))
np.unique(types)
for my actual dataset returns
array(["<class 'float'>", "<class 'list'>"],
dtype='<U15')
df = df.fillna([])so the null values can be processed the same as the non null values.