2

I have a df:

field           category            2022-01-10      2022-01-17      2022-01-24    2022-01-31 
A               Chair               50              100             200           150
B               Chair               75              15              57            75   
C               Chair               0.57            0.64            0.4           0.47
A               Table               50              100             200           150
B               Table               75              15              57            75   
C               Table               0.57            0.64            0.4           0.47
A               Mirror              50              100             200           150
B               Mirror              75              15              57            75   
C               Mirror              0.57            0.64            0.4           0.47

I am trying to group by field or simply drop the category column and aggregate the values for each week so that the result would look like this:

field       2022-01-10      2022-01-17      2022-01-24    2022-01-31 
A           50+50+50        100+100+100     200+200+200   150+150+150
B           75+75+75        15+15+15        57+57+57      75+75+75
C           np.mean([0.57, 0.57, 0.57])     # each weeks average of field C

For fields A & B each week I would sum up the values for that week in different categories, but for field C I want the average.

I tried: df.groupby(['field']).agg({'A':sum, 'B':sum, 'C'.np.mean}) but this of course does not work because the fields arethe thing that's being grouped by and not the column.

I believe this problem can be sorted by somehow dropping an category column and then transforming the values based on field column but I am not sure how that works.

3 Answers 3

1

Create a dict of functions to apply then flat your dataframe and group by field and variable (dates) columns. Use the first index of groupby to get the right function. Finally reshape your dataframe.

aggfuncs = {'A': lambda x: np.sum(x),
            'B': lambda x: np.sum(x),
            'C': lambda x: np.mean(x)}

out = df.melt(['field', 'category']).groupby(['field', 'variable'])['value'] \
        .apply(lambda x: aggfuncs[x.name[0]](x)).unstack() \
        .rename_axis(columns=None).reset_index()
print(out)

# Output
  field  2022-01-10  2022-01-17  2022-01-24  2022-01-31
0     A      150.00      300.00       600.0      450.00
1     B      225.00       45.00       171.0      225.00
2     C        0.57        0.64         0.4        0.47
Sign up to request clarification or add additional context in comments.

Comments

1

{A, B} and {C} require different handling, so we can handle them separately

pd.concat([
    df[df['field'].isin(['A', 'B'])).groupby('field').sum(), #for AB
    df[df['field']=='C'].groupby('field').mean(),  #for C
])

4 Comments

This creates rows for each date column for some reason.
I ended up using df[df['field']=='C'].groupby('field').mean() instead in the second part.
My bad. Please try again
Yes. Agree with you.
1

Here is one way to approach the problem:

s = df.set_index('field').select_dtypes(np.number)

out = s.drop('C').groupby(level=0).sum()
out.loc['C', :] = s.loc['C'].mean()

Result:

print(out)

       2022-01-10  2022-01-17  2022-01-24  2022-01-31
field                                                
A          150.00      300.00       600.0      450.00
B          225.00       45.00       171.0      225.00
C            0.57        0.64         0.4        0.47

2 Comments

Does the .select_dtypes(np.number) takes everything but the floats like 0.57? Asking because this solution is clear but might be limited to what the df is.
@JonasPalačionis select_dtypes(np.number) will select the numeric only columns thus it will automatically drop the category column.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.