How to parallel loop two sets of columns in pandas df?

Question

Is there a way to parallel loop two lists of columns in pandas df? For example, I need to create C1=A1 + B2, C2=A2+B2, C3=A3+B3...

df = pd.DataFrame(np.random.randn(3, 6),
           columns=['A1', 'A2', 'A3','B1', 'B2','B3'])

    A1          A2          A3          B1          B2          B3
0   -0.045858   0.627827    -1.562130   2.094783    0.654119    -0.996711
1   1.003585    0.735500    0.795338    -0.803864   -0.071655   -0.514118
2   0.083501    0.774820    -1.477767   -1.260052   0.861952    -0.674270

Desired output is a list or another pandas frame, please mention... — Hetal Thaker
– Hetal Thaker, Commented Oct 13, 2021 at 3:18
whats wrong with exactly what you typed? C1=df['A1']+df['B2'],C2=df['A2']+df['B2'],etc? i imagine the overhead of making threads/processes to act in parallel exceeds any benefit they provide for something like this ... how big is your df in reality? — Joran Beasley
– Joran Beasley, Commented Oct 13, 2021 at 3:19
with 3 million entries on my not that fast pc it take 0.03 seconds to df['A1'] + df['B1'] — Joran Beasley
– Joran Beasley, Commented Oct 13, 2021 at 3:24
The example is simple. The real data involves a lot more sets of calculation and more complicated calculation. You don't want to type df['A1'] + df['B1'] 100 times. — ponderwonder
– ponderwonder, Commented Oct 26, 2021 at 2:42

U13-Forward · Accepted Answer · 2021-10-13 03:26:15Z

3

You could try groupby with axis=1:

>>> df.join(df.groupby(df.columns.str[1:], axis=1).sum().add_prefix('C'))
         A1        A2        A3        B1        B2        B3        C1        C2        C3
0 -0.207101 -2.051288  1.080908  1.431754  2.585950 -0.840431  1.224653  0.534661  0.240476
1  0.800892 -0.180519  0.111748  2.057722  1.710686  0.617960  2.858614  1.530167  0.729708
2 -0.603030 -0.859876 -1.275922 -2.043422  1.162243  0.223720 -2.646451  0.302366 -1.052203
>>>

Or if you might have different legth'ed prefixes and numbers, try:

df.join(df.groupby(df.columns.str.extract('(\d+)$'), axis=1).sum().add_prefix('C'))

edited Oct 13, 2021 at 3:26

answered Oct 13, 2021 at 3:21

U13-Forward

71.8k15 gold badges100 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Henry Ecker Over a year ago

Good approach. Just to note: the assumption with this implementation is that all number suffixes are a single digit.

U13-Forward Over a year ago

@HenryEcker Edited my answer

Quang Hoang Over a year ago

I'd use df.columns.str.extract('(\d+)$')

EBDS Over a year ago

Smart ! I didn't know can group by columns ! Learnt something

U13-Forward Over a year ago

@EBDS Yeah! It's a very useful feature :) Using axis=1 solves everything :P

|

Collectives™ on Stack Overflow

How to parallel loop two sets of columns in pandas df?

1 Answer 1

10 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Related