0

I am unable to comment on the original question as I don't have a high enough reputation, but I refer to this question DataFrames - Average Columns, specifically this line of code:

dfgrp= df.iloc[:,2:].groupby((np.arange(len(df.iloc[:,2:].columns)) // 2) + 1, axis=1).mean().add_prefix('ColumnAVg')

As I read it, take all rows from column 2 onwards, group by the length of the same rows and columns something something something on columns, not rows, get the mean of those columns then add to new columns called ColumnAVg1/2/3 etc.

I also know this takes the mean of columns 1&2, 3&4, 5&6 etc. but I don't know how it does.

And so my question is, what needs to change in the above code to get the mean of columns 1&2, 2&3, 3&4, 4&5 etc. with the results in the same format?

3
  • (np.arange(len(df.iloc[:,2:].columns)) // 2) + 1 provided a key to each columns and this key is used to group the columns when axis=1. So here basically you have your columns labelled as [0, 0, 1, 1, 2, 2, ...] Commented Nov 11, 2021 at 19:25
  • Thanks @DaveQ but it is that part of the code I do not understand. What is it that says average columns 1&2, 3&4 etc?? And so how do I alter it to give averages of [1&2, 2&3, 3&4...] What part of it gives columns [0,0,1,1,2,2,...] - and what does that mean? How would I read that? Commented Nov 11, 2021 at 20:16
  • i will just put in an answer though not really a good one probably. Commented Nov 11, 2021 at 22:41

2 Answers 2

1
df     = pd.DataFrame(np.random.randn(2, 4), columns=['a', 'b', 'c', 'd'])
groups = [(1,2),(2,3),(2,3,4),(1,3)]
df2    = pd.DataFrame([df.iloc[:, i - 1] for z in groups for i in z]).T
labels = [str(z) for z in groups for _ in z]
result = df2.groupby(by=labels, axis=1).mean()

Probably not what you were looking for but something like this should work.

Sign up to request clarification or add additional context in comments.

1 Comment

Great! Thanks DaveQ
1

So unfortunately you cannot alter that code to get your result, because it achieved what it does by assigning a number to each column, and thus grouping them together. However, you can do something cheeky. Just provide 2 groupings, get the average for each grouping and combined them into a single frame.

df = pd.DataFrame(np.random.randn(2, 4), columns=['a', 'b', 'c', 'd'])

d1 = df.groupby((np.arange(len(df.columns)) // 2), axis=1).mean()
d2 = df.groupby((np.arange(len(df.columns) + 1) // 2)[1:], axis=1).mean()

dfo = pd.DataFrame()
for i in range(len(df.columns)-1):
    c = f'average_{df.columns[i]}_{df.columns[i+1]}'
    if i % 2 == 0:
        dfo[c] = d1[d1.columns[i / 2]]
    else:
        dfo[c] = d2[d2.columns[(i+1) / 2]]

What he did is to assign columns 1,2,3,4 to 1,1,2,2. So in our code, we have d1 assigned according to 1,1,2,2 and d2 assigned according to 0,1,1,2. The for loop is to combine the results.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.