0

Is there a way to parallel loop two lists of columns in pandas df? For example, I need to create C1=A1 + B2, C2=A2+B2, C3=A3+B3...

df = pd.DataFrame(np.random.randn(3, 6),
           columns=['A1', 'A2', 'A3','B1', 'B2','B3'])

    A1          A2          A3          B1          B2          B3
0   -0.045858   0.627827    -1.562130   2.094783    0.654119    -0.996711
1   1.003585    0.735500    0.795338    -0.803864   -0.071655   -0.514118
2   0.083501    0.774820    -1.477767   -1.260052   0.861952    -0.674270
4
  • Desired output is a list or another pandas frame, please mention... Commented Oct 13, 2021 at 3:18
  • 2
    whats wrong with exactly what you typed? C1=df['A1']+df['B2'],C2=df['A2']+df['B2'],etc? i imagine the overhead of making threads/processes to act in parallel exceeds any benefit they provide for something like this ... how big is your df in reality? Commented Oct 13, 2021 at 3:19
  • 1
    with 3 million entries on my not that fast pc it take 0.03 seconds to df['A1'] + df['B1'] Commented Oct 13, 2021 at 3:24
  • The example is simple. The real data involves a lot more sets of calculation and more complicated calculation. You don't want to type df['A1'] + df['B1'] 100 times. Commented Oct 26, 2021 at 2:42

1 Answer 1

3

You could try groupby with axis=1:

>>> df.join(df.groupby(df.columns.str[1:], axis=1).sum().add_prefix('C'))
         A1        A2        A3        B1        B2        B3        C1        C2        C3
0 -0.207101 -2.051288  1.080908  1.431754  2.585950 -0.840431  1.224653  0.534661  0.240476
1  0.800892 -0.180519  0.111748  2.057722  1.710686  0.617960  2.858614  1.530167  0.729708
2 -0.603030 -0.859876 -1.275922 -2.043422  1.162243  0.223720 -2.646451  0.302366 -1.052203
>>> 

Or if you might have different legth'ed prefixes and numbers, try:

df.join(df.groupby(df.columns.str.extract('(\d+)$'), axis=1).sum().add_prefix('C'))
Sign up to request clarification or add additional context in comments.

10 Comments

Good approach. Just to note: the assumption with this implementation is that all number suffixes are a single digit.
@HenryEcker Edited my answer
I'd use df.columns.str.extract('(\d+)$')
Smart ! I didn't know can group by columns ! Learnt something
@EBDS Yeah! It's a very useful feature :) Using axis=1 solves everything :P
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.