0

I have the following data frame

data = 
     a     b     c    val
0   'a1'  'b1'  'c1'   5
1   'a1'  'b1'  'c2'   10
2   'a1'  'b2'  'c3'   1 
3   'a2'  'b1'  'c4'   30
4   'a2'  'b1'  'c5'   20
5   'a2'  'b1'  'c6'   10

I want to create a new data frame with same columns, where the value of a given (a,b,c) is equal to sum of the values of all (aa,bb, cc)'s if a==aa, b==bb and value(aa,bb,cc) >= value(a,b,c). Here aa is an alias for a, and so on.

The desired outcome is

data_new = 
     a     b     c    val
0   'a1'  'b1'  'c1'   15
1   'a1'  'b1'  'c2'   10
2   'a1'  'b2'  'c3'   1 
3   'a2'  'b1'  'c4'   30
4   'a2'  'b1'  'c5'   50
5   'a2'  'b1'  'c6'   60

And I do the following:

data['key'] = data['a']+data['b']
data_new = data
data_new = data.apply(lambda row: data[ (data.key==row.key) 
                                      & (data.val>=row.val) ].val.sum(), axis=1)
del data_new['key']

Actually, in my real data, I have more columns like a and b, that is why boolean comparison such as

(data.a==row.a) & (data.a==row.b) & ...

is slower. The thing is that even the current implementation is not as fast as I want. I have huge tables. Is there any way to do the same thing faster & more efficient?

1 Answer 1

2

If I understand you, then you might be able to use cumsum with a bit of sorting:

>>> grouped = df.sort("val", ascending=False).groupby(["a", "b"])
>>> df["new_val"] = grouped["val"].cumsum()
>>> df
    a   b   c  val  new_val
0  a1  b1  c1    5       15
1  a1  b1  c2   10       10
2  a1  b2  c3    1        1
3  a2  b1  c4   30       30
4  a2  b1  c5   20       50
5  a2  b1  c6   10       60

Since you want to sum those values in a group which are >= the one of interest, you can put the values in descending order within an A, B group and then take the cumulative sum of those.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.