Compute average of the pandas df conditioned on a parameter

Question

I have the following df:

  import numpy as np
  import pandas as pd
  a = [] 
  for i in range(5):
      tmp_df = pd.DataFrame(np.random.random((10,4)))
      tmp_df['lvl'] = i
      a.append(tmp_df) 
  df = pd.concat(a, axis=0)

df =

          0         1         2         3  lvl
0  0.928623  0.868600  0.854186  0.129116    0
1  0.667870  0.901285  0.539412  0.883890    0
2  0.384494  0.697995  0.242959  0.725847    0
3  0.993400  0.695436  0.596957  0.142975    0
4  0.518237  0.550585  0.426362  0.766760    0
5  0.359842  0.417702  0.873988  0.217259    0
6  0.820216  0.823426  0.585223  0.553131    0
7  0.492683  0.401155  0.479228  0.506862    0
..............................................   
3  0.505096  0.426465  0.356006  0.584958    3
4  0.145472  0.558932  0.636995  0.318406    3
5  0.957969  0.068841  0.612658  0.184291    3
6  0.059908  0.298270  0.334564  0.738438    3
7  0.662056  0.074136  0.244039  0.848246    3
8  0.997610  0.043430  0.774946  0.097294    3
9  0.795873  0.977817  0.780772  0.849418    3
0  0.577173  0.430014  0.133300  0.760223    4
1  0.916126  0.623035  0.240492  0.638203    4
2  0.165028  0.626054  0.225580  0.356118    4
3  0.104375  0.137684  0.084631  0.987290    4
4  0.934663  0.835608  0.764334  0.651370    4
5  0.743265  0.072671  0.911947  0.925644    4
6  0.212196  0.587033  0.230939  0.994131    4
7  0.945275  0.238572  0.696123  0.536136    4
8  0.989021  0.073608  0.720132  0.254656    4
9  0.513966  0.666534  0.270577  0.055597    4

I am learning neat pandas functionality and thus wondering, what is the easiest way to compute average along lvl column?

What I mean is:

(df[df.lvl ==0 ] + df[df.lvl ==1 ] + df[df.lvl ==2 ] + df[df.lvl ==3 ] + df[df.lvl ==4 ]) / 5

The desired output should be a table of shape (10,4), without the column lvl, where each element is the average of 5 elements (with lvl = [0,1,2,3,4]. I hope it helps.

can you provide the desired output with maybe 3 or 4 lines of sample data? — Haleemur Ali
– Haleemur Ali, Commented Mar 15, 2018 at 13:34

jezrael · Accepted Answer · 2018-03-15 13:47:46Z

I think need:

np.random.seed(456)
a = [] 
for i in range(5):
    tmp_df = pd.DataFrame(np.random.random((10,4)))
    tmp_df['lvl'] = i
    a.append(tmp_df) 
df = pd.concat(a, axis=0)
#print (df)

df1 = (df[df.lvl ==0 ] + df[df.lvl ==1 ] + 
       df[df.lvl ==2 ] + df[df.lvl ==3 ] + 
       df[df.lvl ==4 ]) / 5
print (df1)
          0         1         2         3  lvl
0  0.411557  0.520560  0.578900  0.541576    2
1  0.253469  0.655714  0.532784  0.620744    2
2  0.468099  0.576198  0.400485  0.333533    2
3  0.620207  0.367649  0.531639  0.475587    2
4  0.699554  0.548005  0.683745  0.457997    2
5  0.322487  0.316137  0.489660  0.362146    2
6  0.430058  0.159712  0.631610  0.641141    2
7  0.399944  0.511944  0.346402  0.754591    2
8  0.400190  0.373925  0.340727  0.407988    2
9  0.502879  0.399614  0.321710  0.715812    2

df = df.set_index('lvl')
df2 = df.groupby(df.groupby('lvl').cumcount()).mean()
print (df2)
          0         1         2         3
0  0.411557  0.520560  0.578900  0.541576
1  0.253469  0.655714  0.532784  0.620744
2  0.468099  0.576198  0.400485  0.333533
3  0.620207  0.367649  0.531639  0.475587
4  0.699554  0.548005  0.683745  0.457997
5  0.322487  0.316137  0.489660  0.362146
6  0.430058  0.159712  0.631610  0.641141
7  0.399944  0.511944  0.346402  0.754591
8  0.400190  0.373925  0.340727  0.407988
9  0.502879  0.399614  0.321710  0.715812

EDIT:

If each subset of DataFrame have index from 0 to len(subset):

df2 = df.mean(level=0)
print (df2)
          0         1         2         3  lvl
0  0.411557  0.520560  0.578900  0.541576    2
1  0.253469  0.655714  0.532784  0.620744    2
2  0.468099  0.576198  0.400485  0.333533    2
3  0.620207  0.367649  0.531639  0.475587    2
4  0.699554  0.548005  0.683745  0.457997    2
5  0.322487  0.316137  0.489660  0.362146    2
6  0.430058  0.159712  0.631610  0.641141    2
7  0.399944  0.511944  0.346402  0.754591    2
8  0.400190  0.373925  0.340727  0.407988    2
9  0.502879  0.399614  0.321710  0.715812    2

Fab! I made a typo and it should be: df.groupby(df.groupby('lvl').cumcount()).mean()

ALollz · Accepted Answer · 2018-03-15 13:35:05Z

1

The groupby function is exactly what you want. It will group based on a condition, in this case where 'lvl' is the same, and then apply the mean function to the values for each column in that group.

df.groupby('lvl').mean()

answered Mar 15, 2018 at 13:35

ALollz

59.7k7 gold badges73 silver badges97 bronze badges

2 Comments

Arnold Klein Over a year ago

Thanks, but I think I confused you. I need to compute the average 'along' lvl parameter, not within. So at the end I need to get a single matrix of size (10,4)

ALollz Over a year ago

Ah that makes sense. Whoops.

Haleemur Ali · Accepted Answer · 2018-03-15 13:56:14Z

it seems like you want to group by the index and take average of all the columns except lvl

i.e.

df.groupby(df.index)[[0,1,2,3]].mean()

For a dataframe generated using

np.random.seed(456)
a = [] 
for i in range(5):
    tmp_df = pd.DataFrame(np.random.random((10,4)))
    tmp_df['lvl'] = i
    a.append(tmp_df) 
df = pd.concat(a, axis=0)

df.groupby(df.index)[[0,1,2,3]].mean()

outputs:

          0         1         2         3
0  0.411557  0.520560  0.578900  0.541576
1  0.253469  0.655714  0.532784  0.620744
2  0.468099  0.576198  0.400485  0.333533
3  0.620207  0.367649  0.531639  0.475587
4  0.699554  0.548005  0.683745  0.457997
5  0.322487  0.316137  0.489660  0.362146
6  0.430058  0.159712  0.631610  0.641141
7  0.399944  0.511944  0.346402  0.754591
8  0.400190  0.373925  0.340727  0.407988
9  0.502879  0.399614  0.321710  0.715812

which is identical to the output from

df.groupby(df.groupby('lvl').cumcount()).mean()

without resorting to double groupby.

IMO this is cleaner to read and will for large dataframe, will be much faster.

Collectives™ on Stack Overflow

Compute average of the pandas df conditioned on a parameter

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related