2

I'm trying to sum all the value where date1 matches date2 for each row in my dataframe, which is similar to this question.

id        value       date1       date2   sum
A         150      4/8/2014    3/8/2014   nan
B         100      5/8/2014    2/8/2014   nan
B         200      7/8/2014    5/8/2014   100
A         200      4/8/2014    3/8/2014   nan
A         300      6/8/2014    4/8/2014   350

I've tried the following but keep getting 'Lengths must match to compare' error.

grp = df.groupby('id')
df['sum'] = grp.apply(lambda x: x[x['date1'] == df['date2'].values]['value'].sum())

Would appreciate any advice!

2
  • Problem is with: x['date1'] == df['date2'], Series have different lengths. Commented Dec 16, 2015 at 8:29
  • IIUC doesn't the following give you what you want: df1 = df.set_index('id') df1.loc[df1['date1'].isin(df1['date2']),'value'].sum(level=0)? Commented Dec 16, 2015 at 9:04

2 Answers 2

1

You can apply function to groupby where use another apply with replace 0 to NaN:

print df

#  id  value      date1      date2  sum
#0  A    150 2014-04-08 2014-03-08  NaN
#1  B    100 2014-05-08 2014-02-08  NaN
#2  B    200 2014-07-08 2014-05-08  100
#3  A    200 2014-04-08 2014-03-08  NaN
#4  A    300 2014-06-08 2014-04-08  350


def f(x):
    x['sum1'] = x.date2.apply(lambda y:  x[x.date1 == y].value.sum()).replace(0, np.nan)
    return x

df = df.groupby('id').apply(f)

print df

#  id  value      date1      date2  sum  sum1
#0  A    150 2014-04-08 2014-03-08  NaN   NaN
#1  B    100 2014-05-08 2014-02-08  NaN   NaN
#2  B    200 2014-07-08 2014-05-08  100   100
#3  A    200 2014-04-08 2014-03-08  NaN   NaN
#4  A    300 2014-06-08 2014-04-08  350   350
Sign up to request clarification or add additional context in comments.

3 Comments

thanks, but actually I'm trying to achieve the sum column in my example. sorry if that isn't clear.
thanks, but this ignores the id, hence not exactly what I need. but thanks for the prompt responses!
it works! thanks jezrael, appreciate your helpfulness!
0

You can use this apply function:

def checkSum(record):
    date2 = record
    sum = df[df["date1"] == date2]["value"].sum()
    if sum == 0:
        return float('nan')
    else:
        return sum

df['sum2'] = df["date2"].apply(checkSum)

1 Comment

thanks, but this ignores the id field too. I tried df['sum2'] = grp["date2"].apply(checkSum) but got the Series lengths must match to compare error.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.