1

I have following dataframe in pandas

code    tank     nozzle_1   nozzle_2    nozzle_var
123     1        1          1           10   
123     1        2          2           12
123     2        1          1           10   
123     2        2          2           12

I want to calculate cumulative sum of columns nozzle_1 and nozzle_2 grouping over tank. Following is my desired dataframe.

code    tank     nozzle_1   nozzle_2    nozzle_var   nozzle_1_cumsum   nozzle_2_cumsum
123     1        1          1           10           1                 1 
123     1        2          2           12           3                 3
123     2        1          1           10           1                 1
123     2        2          2           12           3                 3

I am getting nozzle_1 and nozzle_2 from following code in pandas

cols= df.columns[df.columns.str.contains(pat='nozzle_\d+$', regex=True)] 

How can I calculate cumsum from above list of columns

1

2 Answers 2

2

How about this fancy solution:

cols= df.columns[df.columns.str.contains(pat='nozzle_\d+$', regex=True)] 
df.assign(**df.groupby('tank')[cols].agg(['cumsum'])\
              .pipe(lambda x: x.set_axis(x.columns.map('_'.join), axis=1, inplace=False)))

Output:

   tank  nozzle_1  nozzle_2  nozzle_var  nozzle_1_cumsum  nozzle_2_cumsum
0     1         1         1          10                1                1
1     1         2         2          12                3                3
2     2         1         1          10                1                1
3     2         2         2          12                3                3

In steps:

df_cumsum = df.groupby('tank')[cols].agg(['cumsum'])
df_cumsum.columns = df_cumsum.columns.map('_'.join)
pd.concat([df, df_cumsum], axis=1)

Output:

   tank  nozzle_1  nozzle_2  nozzle_var  nozzle_1_cumsum  nozzle_2_cumsum
0     1         1         1          10                1                1
1     1         2         2          12                3                3
2     2         1         1          10                1                1
3     2         2         2          12                3                3
Sign up to request clarification or add additional context in comments.

Comments

1
import pandas as pd

d = {
    "tank":[1,1,1,2],
    "nozzle_1":[1,2,1,2],
    "nozzle_2":[1,2,1,2]
}

df = pd.DataFrame(d)
columns = df.columns[df.columns.str.match("nozzle_\d+$")]

df[columns + "_cumsum"] = \
    df.groupby("tank")[columns].cumsum(axis=0)
df

Output

    tank    nozzle_1    nozzle_2    nozzle_1_cumsum nozzle_2_cumsum
0   1   1   1   1   1
1   1   2   2   3   3
2   1   1   1   4   4
3   2   2   2   2   2

1 Comment

@Neil sorry I over read the question now I corrected by answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.