2

I have a dataframe as below

+---+---+---+
| A | B | C |
+---+---+---+
| 1 | 0 | 0 |
+---+---+---+
| 0 | 0 | 1 |
+---+---+---+
| 2 | 1 | 1 |
+---+---+---+
| 3 | 1 | 2 |
+---+---+---+
| 4 | 2 | 3 |
+---+---+---+

df = pd.DataFrame({
    'A':[1,0,2,3,4],
    'B':[0,0,1,1,2],
    'C':[0,1,1,2,3]
})

My objective is to concatenate each element with it's corresponding column name and produce a series.

I tried below

df.dot(df.columns +', ').str[:-2]

what I get is

+---------------------------+
| A                         |
+---------------------------+
| C                         |
+---------------------------+
| A, A, B, C                |
+---------------------------+
| A, A, A, B, C, C          |
+---------------------------+
| A, A, A, A, B, B, C, C, C |
+---------------------------+

But, I want is

+------------+
| A          |
+------------+
| C          |
+------------+
| 2A, B, C   |
+------------+
| 3A, B, 2C  |
+------------+
| 4A, 2B, 3C |
+------------+

How should I change my code to achieve this?

1
  • Please have a look at my answer as well. Commented Feb 23, 2021 at 7:37

2 Answers 2

1

One idea with lambda function:

f = lambda x: ', '.join(f'{v}{k}' if v != 1 else k for k, v in x[x > 0].items())
df = df.apply(f, axis=1)
print (df)
0             A
1             C
2      2A, B, C
3     3A, B, 2C
4    4A, 2B, 3C
dtype: object

Another idea with melting, remove 0 rows, join numbers with columns names and last join in groupby:

df = df.melt(ignore_index=False)
df = df[df['value'].ne(0)]
df['variable'] = df['value'].mask(df['value'].eq(1), '').astype(str) + df['variable']

df = df.groupby(level=0)['variable'].agg(', '.join)
print (df)
0             A
1             C
2      2A, B, C
3     3A, B, 2C
4    4A, 2B, 3C
Name: variable, dtype: object

     
Sign up to request clarification or add additional context in comments.

Comments

1

Another way of solving this using collections.Counter and List comprehension:

In [416]: from collections import Counter

In [403]: y = df.dot(df.columns).tolist()

In [420]: ans = [' ,'.join({k: (str(v)+k if v > 1 else k) for k,v in Counter(i).items()}.values()) if len(i) > 1 else i for i in y]

In [421]: pd.DataFrame(ans)
Out[421]: 
            0
0           A
1           C
2    2A ,B ,C
3   3A ,B ,2C
4  4A ,2B ,3C

Performance of solutions:

@jezrael solutions:

In [427]: def j():
     ...:     f = lambda x: ', '.join(f'{v}{k}' if v != 1 else k for k, v in x[x > 0].items())
     ...:     df.apply(f, axis=1)
     ...: 

In [428]: %timeit j()
1.22 ms ± 47.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [434]: def j1():
     ...:     x = df.melt(ignore_index=False)
     ...:     x = x[x['value'].ne(0)]
     ...:     x['variable'] = x['value'].mask(x['value'].eq(1), '').astype(str) + x['variable']
     ...:     x = x.groupby(level=0)['variable'].agg(', '.join)
     ...: 

In [435]: %timeit j1()
3.19 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

My solution:

In [429]: def m():
     ...:     y = df.dot(df.columns).tolist()
     ...:     ans = [' ,'.join({k: (str(v)+k if v > 1 else k) for k,v in Counter(i).items()}.values()) if len(i) > 1 else i for i in y]
     ...:     pd.DataFrame(ans)
     ...: 

In [430]: %timeit m()
213 µs ± 3.87 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

3 Comments

Upvoted.! This works. 1st time I encountered ` from collections import Counter`. Thank You.!
@Tommy I've also added performance. My solution works the fastest.
Awesome.! Cheers..!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.