3

I have a DataFrame like this:

| cat0  | cat1 | cat2 | col_list  | Quantity |
| ----- | ---- | ---- | ----------| -------- |
| alpha | x    | a    | [a,b,c,d] | 4        |
| alpha | y    | a    | [e]       | 1        |
| beta  | z    | a    | [f,g]     | 2        |
| gamma | p    | b    | [h]       | 1        |
| gamma | q    | b    | [I,j,k,l] | 4        |
| phi   | r    | c    | [r,s]     | 2        |
| eita  | s    | c    | [m,n]     | 2        |

I want it to transform it based on cat2 column. If they have same value then combine those rows into 1 and merge the rows like this:

| cat0       | cat1  | cat2 | col_list        | Quantity |
| ---------- | ----- | ---- | ----------------| -------- |
| alpha,beta | x,y,z | a    | [a,b,c,d,e,f,g] | 7        |
| gamma      | p,q   | b    | [h,I,j,k,l]     | 5        |
| phi,eita   | r,s   | c    | [r,s,m,n]       | 4        |

If column have strings values (column: cat0,cat1) join and separate them by a comma. If it has list values (column" col_list) then add those elements into a single list. If it is a int value (column: Quantity), then just add them.

2 Answers 2

2

You can use groupby.agg, with list/','.join, sum, or itertools.chain:

from itertools import chain

(df
 .groupby('cat2', as_index=False)
 .agg({'cat0': ','.join, 'cat1': ','.join,
       'col_list': lambda x: list(chain.from_iterable(x)),
       'Quantity': 'sum'
       })
 [df.columns] # reindex like original
)

NB. in place or chain, you can also use: lambda x: [v for l in x for x in l]

output:

               cat0   cat1 cat2               col_list  Quantity
0  alpha,alpha,beta  x,y,z    a  [a, b, c, d, e, f, g]         7
1       gamma,gamma    p,q    b        [h, I, j, k, l]         5
2          phi,eita    r,s    c           [r, s, m, n]         4
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @mozway. I was using groupby.agg only but wasnt able to get the right set of rules. I need to learn to use this better. Very handy. Only question is - can I not avoid cat0 and cat1 from being list? and just have them as string separated by comma
I tried putting str instead of list but then I get an extra \n with some int values which are indices for those rows
Of course, you can use ','.join in place of list
2

You can use agg after pandas.groupby and use functools.reduce for merging mutiple list and ','.join for merging string with comma.

>>> from functools import reduce
>>> df.groupby('cat2').agg({'cat0': ','.join, 
                        'cat1': ','.join,
                        'col_list' : lambda x: reduce(lambda y,z : y+z, x),
                        'Quantity' : 'sum'})


cat0          cat1    cat2    col_list         Quantity 
alpha,beta    x,y,z   a       [a,b,c,d,e,f,g]  7
gamma         p,q     b       [h,I,j,k,l]      5
phi,eita      r,s     c       [r,s,m,n]        4

Explanation: (how reduce merge multiple list?)

>>> reduce(lambda y,z : y+z, [['a','b','c','d'], ['e'], ['f','g']])
# y <- (['a','b','c','d'] + ['e'])
# y + ['f','g']
['a', 'b', 'c', 'd', 'e', 'f', 'g']

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.