Pandas DataFrame manipulation based on Column value

Question

Suppose I have a Pandas Dataframe in the format below based on a 'Dict of Dicts 'Group'list (also below)…

ITEMS={
    “Item_group1”:{‘Stuff’:’Some stuf’
            ‘More Stuff’:’Extra Stuff’
            Group:[[Iteration1, 18, 25,0], [Iteration1, 43, 67,1], [Iteration1, 87, 76,1],
                [Iteration2, 45, 29,0], [Iteration2, 44, 77,1], [Iteration2, 43, 74,0]],

            }
    “Item_group2”:{‘Stuff’:’Some stuf’
            ‘More Stuff’:’Extra Stuff’
            Group:[[Iteration1, 75, 564,0], [Iteration1, 21, 87,1], [Iteration1, 7, 5,1],
                [Iteration2, 54, 24,0], [Iteration2, 7, 45,1], [Iteration2, 45, 745,0]],
            }

DataFrame in the following format….

Iteration   Value1  Value2  Feature Active
Iteration1  18      25      0
Iteration1  3       67      1
Iteration1  87      76      1
Iteration2  45      29      0
Iteration2  44      7       1
Iteration2  43      74      0

How would I go about separating and calculating the averages for each iteration based on ‘Feature Active’ == 1, and ignore any ‘Feature Active’ ==0 entries?

I have the following code to calculate the stats per iteration for Value1 and Value2 after separating for ‘Iteration’ and ‘Feature Active’ as the keys, but it displays ‘Feature Active’ ==0 which I am not concerned with.

FeatureAvgs = Item_group1_DF.groupby(['Iteration’,’Feature Active'])
print np.round(FeatureAvgs[['Value1','Value2']].describe(), decimals=1)

Which produces the following output… (ignore the actual figures, this was taken from another dataframe)

Iteration   Feature Enabled
Iteration1  0               count   3672.0   3672.0
                            mean   -1352.5      0.0
                            std      220.5      0.0
                            min    -1920.0      0.0
                            25%    -1507.2      0.0
                            50%    -1267.0      0.0
                            75%    -1184.0      0.0
                            max     -785.0      0.0
            1               count    580.0    580.0
                            mean   -1368.6  -1394.5
                            std      151.5    157.7
                            min    -1788.0  -1805.0
                            25%    -1454.2  -1490.2
                            50%    -1335.5  -1361.0
                            75%    -1270.0  -1291.0
                            max    -1045.0  -1033.0
Iteration2  0               count  20612.0  20612.0
                            mean   -1073.5      0.0
                            std      142.3      0.0
                            min    -1730.0      0.0
                            25%    -1088.0      0.0
                            50%    -1036.0      0.0
                            75%    -1005.0      0.0
                            max     -805.0      0.0
            1               count  14718.0  14718.0
                            mean   -1113.6  -1161.1
                            std      129.3    134.9
                            min    -1773.0  -1818.0
                            25%    -1151.0  -1214.0
                            50%    -1095.0  -1122.0
                            75%    -1043.0  -1075.0
                            max     -832.0   -897.0

But I am just after the mean averages for when the feature is active (==1). Sorry for the long question, but I am new to Pandas, and still getting through the documentation

can't you just filter the df first and then group? FeatureAvgs = Item_group1_DF[item_group1_DF['Feature Enabled'] == 1].groupby(['Iteration’,’Feature Active'])[['Value1','Value2']].mean() — EdChum
– EdChum, Commented May 28, 2015 at 12:35
Thanks EdChum, I didn't know you could edit the DataFrame before running a query on it — MikG
– MikG, Commented May 28, 2015 at 14:28

EdChum · Accepted Answer · 2015-05-28 14:43:10Z

1

Rather than filtering on the groupby object you can just filter the initial df first:

FeatureAvgs = Item_group1_DF[item_group1_DF['Feature Enabled'] == 1].groupby(['Iteration’,’Feature Active'])[['Value1','Value2']].mean()

Also there is no need to use describe if you just want the mean just use mean, as an aside you can access the mean column from the result of describe by using:

print np.round(FeatureAvgs[['Value1','Value2']].describe()['mean'], decimals=1)

answered May 28, 2015 at 14:43

EdChum

397k204 gold badges837 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

jrjc · Accepted Answer · 2015-05-28 16:33:16Z

If I understood well, you can do:

> df.groupby(["Feature Active", "Iteration"]).mean().loc[1]

            Value1  Value2
Iteration                 
Iteration1      45    71.5
Iteration2      44     7.0

You groupby first on the feature, and second on the Iteration variable. On each group you apply the mean() function, and you get the group whose index is 1, which correspond to the Feature Active == 1 group.

With:

> df

    Iteration  Value1  Value2  Feature Active
0  Iteration1      18      25               0
1  Iteration1       3      67               1
2  Iteration1      87      76               1
3  Iteration2      45      29               0
4  Iteration2      44       7               1
5  Iteration2      43      74               0


> df.groupby(["Feature Active", "Iteration"]).mean()

                           Value1  Value2
Feature Active Iteration                 
0              Iteration1      18    25.0
               Iteration2      44    51.5
1              Iteration1      45    71.5
               Iteration2      44     7.0

Tell me if it's not what you wanted.

HTH

Collectives™ on Stack Overflow

Pandas DataFrame manipulation based on Column value

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related