Summing Groups of Columns within a Pandas Dataframe

Question

I have a pandas dataframe with 600 columns (df1), and I want to sum the values of each column in groups of 6. In other words, I want to create a new dataframe (df2) that has 100 columns, each column being the sum of 6 columns from the input dataframe. For example, Each row the first column in df2 will be the sum of the first six columns in df1 (keeping the rows separate). The dataframe I am using also has string values for each column name (here just represented with single letters)

For df1:

      A    B    C    D    E    F    G    H    I    J ...   
0     9    6    3    4    7    7    6    0    5    2 ...       
1     8    0    6    6    0    5    6    5    8    7 ...           
2     9    0    7    2    9    5    3    2    1    7 ...            
3     5    2    9    6    7    0    3    8    5    0 ...            
4     7    1    0    7    4    0    2    0    5    8 ...     
5     0    9    2    0    4    9    5    7    6    2 ...

I would want the first column of df2 to be:

Where each row is the sum of the first six columns of that row. The next column would then be the sum of the next six columns and so on, with the column name being the name of the first column in each set of 6. (First column name is the first column's, the second column name is the seventh column's, etc.)

I've tried using the column indices to sum the correct columns, but I am having issues finding a way to store the sums in new columns with relevant names.

Is there a pythonic way to create these columns, and pull column names from df into df2?

jezrael · Accepted Answer · 2016-07-15 20:37:04Z

4

You can groupby by columns (axis=1) with groups created by df.columns //6 and sum:

print (df)
   0  1  2  3  4  5  6  7  8  9  10  11  12  13
0  9  6  3  4  7  7  6  0  5  2   2   3   7   2
1  8  0  6  6  0  5  6  5  8  7   9   5   5   1
2  9  0  7  2  9  5  3  2  1  7   5   9   6   6
3  5  2  9  6  7  0  3  8  5  0   8   8   9   9
4  7  1  0  7  4  0  2  0  5  8   2   4   4   1
5  0  9  2  0  4  9  5  7  6  2   7   1   5   3

#if values of columns are not int
#df.columns = df.columns.astype(int) 
print (df.columns // 6)
Int64Index([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2], dtype='int64')

print (df.groupby(df.columns // 6, axis=1).sum())
    0   1   2
0  36  18   9
1  25  40   6
2  32  27  12
3  29  32  18
4  19  21   5
5  24  28   8

EDIT:

You can create Index from range and shape (get length of columns) and use it in groupby:

idx = pd.Index(range(df.shape[1])) // 6
print (idx)
Int64Index([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2], dtype='int64')

df1 = df.groupby(idx, axis=1).sum()
#if need rename columns by categories
df1.columns = df.columns[::6]
print (df1)
    A   G   M
0  36  18   9
1  25  40   6
2  32  27  12
3  29  32  18
4  19  21   5
5  24  28   8

edited Jul 15, 2016 at 20:37

answered Jul 15, 2016 at 20:08

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Nizag Over a year ago

The issue with this solution is that the column names are strings (names of categories) so I don't think I can use the floor division operator to separate the groups. I will edit my post so this is more clear.

Nizag Over a year ago

Your edit did it! I'm now looking into the pd.Index functions as well as the dataframe shape function to get a better understanding of how this stuff works. Thanks so much!

jezrael Over a year ago

Glad can help you. I also add rename new columns to categories names.

Collectives™ on Stack Overflow

Summing Groups of Columns within a Pandas Dataframe

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related