1

Task

I would like to custom aggregate my DataFrame

import numpy as np
df = pd.DataFrame({'a': [1,1,1,2,2], 'b': [[(1,2,3),(4,5),(6,)],[(7,8),(9,10)],np.NaN,[(11,12),(13,)],np.NaN], 'c': [1,2,3,4,5]})

   a                          b  c
0  1  [(1, 2, 3), (4, 5), (6,)]  1
1  1          [(7, 8), (9, 10)]  2
2  1                        NaN  3
3  2          [(11, 12), (13,)]  4
4  2                        NaN  5

such that the lists in column b are extending each other per group. The result shall be

pd.DataFrame({'a': [1,2], 'b': [[(1,2,3),(4,5),(6,),(7,8),(9,10)],[(11,12),(13,)]], 'c': [6,9]})

   a                                           b  c
0  1  [(1, 2, 3), (4, 5), (6,), (7, 8), (9, 10)]  6
1  2                           [(11, 12), (13,)]  9

Attempted Solution

I was going with

def mylistaggregator(l):
    return [item for sublist in l.tolist() for item in sublist]

df. \
    groupby('a', sort=False). \
    agg({'b': mylistaggregator,
         'c': 'sum'})

but get

TypeError: 'float' object is not iterable

and are not sure what the solution would be. I also tinkered around with lambda, but did not get anywhere.

Additional information

Running

types = []
for i in df.b:
    types.append(str(type(i)))
np.unique(types)

for my actual dataset returns

array(["<class 'float'>", "<class 'list'>"], 
      dtype='<U15')
3
  • How is that a bad question? It has a MWE and everything and I could not find the solution on the web. Commented Jun 16, 2017 at 12:33
  • Usually, that error implies that there are null values in the column. Null values in pandas are represented as floats. try df = df.fillna([]) so the null values can be processed the same as the non null values. Commented Jun 16, 2017 at 12:40
  • @user2583933: TypeError: "value" parameter must be a scalar or dict, but you passed a "list" Commented Jun 16, 2017 at 12:45

1 Answer 1

1

You need filter out NaNs:

def mylistaggregator(l):
    return ([item for sublist in l.tolist() if isinstance(sublist,list) for item in sublist])

Or:

def mylistaggregator(l):
    return([item for subl in l.tolist() if not isinstance(subl, float) for item in subl])



df1 = df. \
    groupby('a', sort=False). \
    agg({'b': mylistaggregator,
         'c': 'sum'})

print (df1)
                                            b  c
a                                               
1  [(1, 2, 3), (4, 5), (6,), (7, 8), (9, 10)]  6
2                           [(11, 12), (13,)]  9

Another solution is replace NaNs to []:

def mylistaggregator(l):
    return ([item for sublist in l.tolist() for item in sublist])

s = pd.Series([[]], index=df.index)
df['b'] = df['b'].combine_first(s)
#or
#df['b'] = df['b'].fillna(s)

df1 = df. \
    groupby('a', sort=False). \
    agg({'b': mylistaggregator,
         'c': 'sum'})

print (df1)
                                            b  c
a                                               
1  [(1, 2, 3), (4, 5), (6,), (7, 8), (9, 10)]  6
2                           [(11, 12), (13,)]  9
Sign up to request clarification or add additional context in comments.

9 Comments

The solutions work on the example dataset, but on my real dataset I get TypeError: '<' not supported between instances of 'str' and 'tuple'. I added some additional information about the columns datatypes.
Hmmm, hard answer, because I have no data returning error. Last solution works?
No, none of the solutions work. Any idea how to debug? np.unique(df.b) is array([[], [(8338, 8339)], [(8338, 8339, 8340)], [(8338, 8339, 8340, 8341)], [(8338, 8339, 8340, 8341, 8343)], [(8339, 8340)], [(8339, 8340, 8341)], [(8339, 8340, 8341, 8343)], [(8340, 8341)], [(8340, 8341, 8343)], [(8341, 8343)]], dtype=object) after fillna.
One idea - Instead groupby('a', sort=False). use groupby(df['a'].astype(str), sort=False)., then is not necessary helper column.
No, because it is list and pandas try automatically convert it to one item Series. Only works fillna or combine_first by another Series full of []. Pandas works, but things are complicated with lists or another nested structures.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.