Averages of DataFrame columns in Python

Question

I am unable to comment on the original question as I don't have a high enough reputation, but I refer to this question DataFrames - Average Columns, specifically this line of code:

dfgrp= df.iloc[:,2:].groupby((np.arange(len(df.iloc[:,2:].columns)) // 2) + 1, axis=1).mean().add_prefix('ColumnAVg')

As I read it, take all rows from column 2 onwards, group by the length of the same rows and columns something something something on columns, not rows, get the mean of those columns then add to new columns called ColumnAVg1/2/3 etc.

I also know this takes the mean of columns 1&2, 3&4, 5&6 etc. but I don't know how it does.

And so my question is, what needs to change in the above code to get the mean of columns 1&2, 2&3, 3&4, 4&5 etc. with the results in the same format?

(np.arange(len(df.iloc[:,2:].columns)) // 2) + 1 provided a key to each columns and this key is used to group the columns when axis=1. So here basically you have your columns labelled as [0, 0, 1, 1, 2, 2, ...] — DaveQ
– DaveQ, Commented Nov 11, 2021 at 19:25
Thanks @DaveQ but it is that part of the code I do not understand. What is it that says average columns 1&2, 3&4 etc?? And so how do I alter it to give averages of [1&2, 2&3, 3&4...] What part of it gives columns [0,0,1,1,2,2,...] - and what does that mean? How would I read that? — SteveS
– SteveS, Commented Nov 11, 2021 at 20:16
i will just put in an answer though not really a good one probably. — DaveQ
– DaveQ, Commented Nov 11, 2021 at 22:41

DaveQ · Accepted Answer · 2021-11-11 22:50:37Z

1

df     = pd.DataFrame(np.random.randn(2, 4), columns=['a', 'b', 'c', 'd'])
groups = [(1,2),(2,3),(2,3,4),(1,3)]
df2    = pd.DataFrame([df.iloc[:, i - 1] for z in groups for i in z]).T
labels = [str(z) for z in groups for _ in z]
result = df2.groupby(by=labels, axis=1).mean()

Probably not what you were looking for but something like this should work.

answered Nov 11, 2021 at 22:50

DaveQ

1,77211 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

SteveS Over a year ago

Great! Thanks DaveQ

michaelgbj · Accepted Answer · 2021-11-11 21:00:33Z

So unfortunately you cannot alter that code to get your result, because it achieved what it does by assigning a number to each column, and thus grouping them together. However, you can do something cheeky. Just provide 2 groupings, get the average for each grouping and combined them into a single frame.

df = pd.DataFrame(np.random.randn(2, 4), columns=['a', 'b', 'c', 'd'])

d1 = df.groupby((np.arange(len(df.columns)) // 2), axis=1).mean()
d2 = df.groupby((np.arange(len(df.columns) + 1) // 2)[1:], axis=1).mean()

dfo = pd.DataFrame()
for i in range(len(df.columns)-1):
    c = f'average_{df.columns[i]}_{df.columns[i+1]}'
    if i % 2 == 0:
        dfo[c] = d1[d1.columns[i / 2]]
    else:
        dfo[c] = d2[d2.columns[(i+1) / 2]]

What he did is to assign columns 1,2,3,4 to 1,1,2,2. So in our code, we have d1 assigned according to 1,1,2,2 and d2 assigned according to 0,1,1,2. The for loop is to combine the results.

Collectives™ on Stack Overflow

Averages of DataFrame columns in Python

2 Answers 2

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related