3

I have a pandas dataframe with 600 columns (df1), and I want to sum the values of each column in groups of 6. In other words, I want to create a new dataframe (df2) that has 100 columns, each column being the sum of 6 columns from the input dataframe. For example, Each row the first column in df2 will be the sum of the first six columns in df1 (keeping the rows separate). The dataframe I am using also has string values for each column name (here just represented with single letters)

For df1:

      A    B    C    D    E    F    G    H    I    J ...   
0     9    6    3    4    7    7    6    0    5    2 ...       
1     8    0    6    6    0    5    6    5    8    7 ...           
2     9    0    7    2    9    5    3    2    1    7 ...            
3     5    2    9    6    7    0    3    8    5    0 ...            
4     7    1    0    7    4    0    2    0    5    8 ...     
5     0    9    2    0    4    9    5    7    6    2 ...       

I would want the first column of df2 to be:

    A    G ... 
0   36  
1   25
2   32
3   29
4   19
5   24

Where each row is the sum of the first six columns of that row. The next column would then be the sum of the next six columns and so on, with the column name being the name of the first column in each set of 6. (First column name is the first column's, the second column name is the seventh column's, etc.)

I've tried using the column indices to sum the correct columns, but I am having issues finding a way to store the sums in new columns with relevant names.

Is there a pythonic way to create these columns, and pull column names from df into df2?

1 Answer 1

4

You can groupby by columns (axis=1) with groups created by df.columns //6 and sum:

print (df)
   0  1  2  3  4  5  6  7  8  9  10  11  12  13
0  9  6  3  4  7  7  6  0  5  2   2   3   7   2
1  8  0  6  6  0  5  6  5  8  7   9   5   5   1
2  9  0  7  2  9  5  3  2  1  7   5   9   6   6
3  5  2  9  6  7  0  3  8  5  0   8   8   9   9
4  7  1  0  7  4  0  2  0  5  8   2   4   4   1
5  0  9  2  0  4  9  5  7  6  2   7   1   5   3

#if values of columns are not int
#df.columns = df.columns.astype(int) 
print (df.columns // 6)
Int64Index([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2], dtype='int64')

print (df.groupby(df.columns // 6, axis=1).sum())
    0   1   2
0  36  18   9
1  25  40   6
2  32  27  12
3  29  32  18
4  19  21   5
5  24  28   8

EDIT:

You can create Index from range and shape (get length of columns) and use it in groupby:

idx = pd.Index(range(df.shape[1])) // 6
print (idx)
Int64Index([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2], dtype='int64')

df1 = df.groupby(idx, axis=1).sum()
#if need rename columns by categories
df1.columns = df.columns[::6]
print (df1)
    A   G   M
0  36  18   9
1  25  40   6
2  32  27  12
3  29  32  18
4  19  21   5
5  24  28   8
Sign up to request clarification or add additional context in comments.

3 Comments

The issue with this solution is that the column names are strings (names of categories) so I don't think I can use the floor division operator to separate the groups. I will edit my post so this is more clear.
Your edit did it! I'm now looking into the pd.Index functions as well as the dataframe shape function to get a better understanding of how this stuff works. Thanks so much!
Glad can help you. I also add rename new columns to categories names.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.