2

How to create cumulative sum (new_supply)in dataframe python from demand column from table

item   Date       supply  demand
A     2018-01-01  -       10
A     2018-01-02  -       15
A     2018-01-03  100     30
A     2018-01-04  -       10
A     2018-01-05  -       40
A     2018-01-06  50      50
A     2018-01-07  -       10
B     2018-01-01  -       20
B     2018-01-02  -       30
B     2018-01-03  20      60
B     2018-01-04  -       20
B     2018-01-05  100     10
B     2018-01-06  -       20
B     2018-01-07  -       30

New Desired table from the above table

item   Date       supply  demand  new_supply
A     2018-01-01  -       10      -
A     2018-01-02  -       15      -
A     2018-01-03  100     30      55
A     2018-01-04  -       10      -
A     2018-01-05  -       40      -
A     2018-01-06  50      50      100
A     2018-01-07  -       10      -
B     2018-01-01  -       20      -
B     2018-01-02  -       30      -
B     2018-01-03  20      60      110
B     2018-01-04  -       20      -
B     2018-01-05  100     10      140
B     2018-01-06  -       20      -
B     2018-01-07  -       30      -
1
  • Is output correct? Commented Nov 4, 2018 at 9:42

2 Answers 2

1

I believe you need DataFrameGroupBy.cumsum with numpy.where with isnumeric for set new values only for numeric ones by column supply:

m = df['supply'].str.isnumeric()
g = m.iloc[::-1].cumsum()

df['new'] = np.where(m, df.groupby(['item',g])['demand'].cumsum(), '-')
print (df)
   item        Date supply  demand  new
0     A  2018-01-01      -      10    -
1     A  2018-01-02      -      15    -
2     A  2018-01-03    100      30   55
3     A  2018-01-04      -      10    -
4     A  2018-01-05      -      40    -
5     A  2018-01-06     50      50  100
6     A  2018-01-07      -      10    -
7     B  2018-01-01      -      20    -
8     B  2018-01-02      -      30    -
9     B  2018-01-03     20      60  110
10    B  2018-01-04      -      20    -
11    B  2018-01-05    100      10   30
12    B  2018-01-06      -      20    -
13    B  2018-01-07      -      30    -

Or:

m = df['supply'].str.isnumeric()

df['new'] = np.where(m, df.groupby('item')['demand'].cumsum(), '-')
print (df)
   item        Date supply  demand  new
0     A  2018-01-01      -      10    -
1     A  2018-01-02      -      15    -
2     A  2018-01-03    100      30   55
3     A  2018-01-04      -      10    -
4     A  2018-01-05      -      40    -
5     A  2018-01-06     50      50  155
6     A  2018-01-07      -      10    -
7     B  2018-01-01      -      20    -
8     B  2018-01-02      -      30    -
9     B  2018-01-03     20      60  110
10    B  2018-01-04      -      20    -
11    B  2018-01-05    100      10  140
12    B  2018-01-06      -      20    -
13    B  2018-01-07      -      30    -

If need all values:

df['new'] = df.groupby('item')['demand'].cumsum()
print (df)
   item        Date supply  demand  new
0     A  2018-01-01      -      10   10
1     A  2018-01-02      -      15   25
2     A  2018-01-03    100      30   55
3     A  2018-01-04      -      10   65
4     A  2018-01-05      -      40  105
5     A  2018-01-06     50      50  155
6     A  2018-01-07      -      10  165
7     B  2018-01-01      -      20   20
8     B  2018-01-02      -      30   50
9     B  2018-01-03     20      60  110
10    B  2018-01-04      -      20  130
11    B  2018-01-05    100      10  140
12    B  2018-01-06      -      20  160
13    B  2018-01-07      -      30  190
Sign up to request clarification or add additional context in comments.

3 Comments

As it appears, I think the OP wants the cumsum to be resetted after a new supply is issued, right? I was about to write an answer, but I searched first and found that you have already answered this :)) The only difference is that a groupby is needed here.
@today - yes, it should working nice at least 0 value in supply... But still not 100% sure what OP need :(
Thanks brother, can answer please : stackoverflow.com/questions/53140438/…
0

Simply as:

df['new_supply'] = df.demand.cumsum()

Precisely above should work, when we are looking for cumulative sum of a column.. Just simulating your DataFrame:

>>> df
   item        Date supply  demand
0     A  2018-01-01      -      10
1     A  2018-01-02      -      15
2     A  2018-01-03    100      30
3     A  2018-01-04      -      10
4     A  2018-01-05      -      40
5     A  2018-01-06     50      50
6     A  2018-01-07      -      10
7     B  2018-01-01      -      20
8     B  2018-01-02      -      30
9     B  2018-01-03     20      60
10    B  2018-01-04      -      20
11    B  2018-01-05    100      10
12    B  2018-01-06      -      23
13    B  2018-01-07      -      30

So, When we do df.<particulat_col>.cumsum() it will produce cumsun of that, in our case as follows with a new desired column along:

>>> df
   item        Date supply  demand  new_supply
0     A  2018-01-01      -      10          10
1     A  2018-01-02      -      15          25
2     A  2018-01-03    100      30          55
3     A  2018-01-04      -      10          65
4     A  2018-01-05      -      40         105
5     A  2018-01-06     50      50         155
6     A  2018-01-07      -      10         165
7     B  2018-01-01      -      20         185
8     B  2018-01-02      -      30         215
9     B  2018-01-03     20      60         275
10    B  2018-01-04      -      20         295
11    B  2018-01-05    100      10         305
12    B  2018-01-06      -      23         328
13    B  2018-01-07      -      30         358

If indeed we are looking for a aggregative or cumulative sum of two items , then it should be go along with groupby as follows:

>>> df['new_supply'] = df.groupby('item')['demand'].cumsum()

>>> df
   item        Date supply  demand  new_supply
0     A  2018-01-01      -      10          10
1     A  2018-01-02      -      15          25
2     A  2018-01-03    100      30          55
3     A  2018-01-04      -      10          65
4     A  2018-01-05      -      40         105
5     A  2018-01-06     50      50         155
6     A  2018-01-07      -      10         165
7     B  2018-01-01      -      20          20
8     B  2018-01-02      -      30          50
9     B  2018-01-03     20      60         110
10    B  2018-01-04      -      20         130
11    B  2018-01-05    100      10         140
12    B  2018-01-06      -      23         163
13    B  2018-01-07      -      30         193

Note: In case you need the output you desired then you have to modify your question with logic you are filtering the dataset indeed!

2 Comments

hmmm, can you add output? I think it is different like OP need.
@jezrael, i do have Dataframe to test , i need to reproduce one :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.