How to generate all possible combinations of columns in a pandas dataframe with many columns?

Question

I have the following DataFrame:

I would like to create every possible unique combination of these columns without repetition so that I would end up with a dataframe containing the following data: A, B, C, A+B, A+C, B+C, A+B+C. I do not want to have any columns repeated in any combination, e.g. A+A+B+C or A+B+B+C.

I would also like to have each column in the dataframe labelled with the relevant variable names (e.g. for the combination of A + B, column name should be 'A_B')

This is the desired DataFrame:

   A  B  C  A_B  A_C  B_C  A_B_C
0  1  1  4    2    5    5      6
1  3  9  6   12    9   15     18
2  3  4  3    7    6    7     10

This is relatively easy with just 3 variables using itertools and I have used the following code to do it:

    import pandas as pd
    import itertools

    combos_2 = pd.DataFrame({'{}_{}'.format(a, b):
    df[a] + df[b] 
    for a, b in itertools.combinations(df.columns, 2)})

    combos_3 = pd.DataFrame({'{}_{}_{}'.format(a, b, c):
    df[a] + df[b] + df[c] 
    for a, b, c in itertools.combinations(df.columns, 3)})

    composites = pd.concat([df, combos_2, combos_3], axis=1)

However, I can't figure out how to extend this code in a pythonic way to account for a DataFrame with a much larger number of columns. Is there a way of making the following code more pythonic and extending it for use with a large number of columns? Or is there a more efficient way of generating the combinations?

BENY · Accepted Answer · 2019-11-16 20:57:36Z

3

We need first create the combination based on the columns , then create the dataframe

from itertools import combinations
input = df.columns
output = sum([list(map(list, combinations(input, i))) for i in range(len(input) + 1)], [])
output
Out[21]: [[], ['A'], ['B'], ['C'], ['A', 'B'], ['A', 'C'], ['B', 'C'], ['A', 'B', 'C']]
df1=pd.DataFrame({'_'.join(x) : df[x].sum(axis=1 ) for x in output if x !=[]})
df1
Out[22]: 
   A  B  C  A_B  A_C  B_C  A_B_C
0  1  3  3    4    4    6      7
1  1  9  4   10    5   13     14
2  4  6  3   10    7    9     13

answered Nov 16, 2019 at 20:57

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Eugene Pakhomov · Accepted Answer · 2019-11-16 21:32:59Z

1

You were pretty close:

from itertools import chain, combinations

# Need to realize the generator to make sure that we don't
# read columns from the altered dataframe.
combs = list(chain.from_iterable(combinations(d.columns, i)
                                 for i in range(2, len(d.columns) + 1)))
for cols in combs:
    df['_'.join(cols)] = df.loc[:, cols].sum(axis=1)

A word of precaution - if you combine columns with _ while the column names themselves can contain _, you're bound to have column name clashes sooner or later.

edited Nov 16, 2019 at 21:32

answered Nov 16, 2019 at 20:59

Eugene Pakhomov

11.2k3 gold badges32 silver badges57 bronze badges

3 Comments

boleneuro Over a year ago

Thank you for answering my question but this does not give the desired dataframe. Instead it returns a dataframe with 26 columns (it should only have 7 columns as seen in the desired dataframe I showed in my original question). I might not have been clear enough in my question, but I only want each unique combination where the original columns are not repeated (i.e. there shouldn't be any columns with A + A + B + C).

boleneuro Over a year ago

I edited my question to specify that I only want combinations without repetitions of individual columns. Apologies for the confusion!

Eugene Pakhomov Over a year ago

Oops, sorry about that - I omitted a call to list thinking it wouldn't be needed because of iteration. But I missed that the generator would see the changed dataframe on each iteration. I've edited the answer. An alternative would be to just create a new dataframe instead of changing the existing one.

Collectives™ on Stack Overflow

How to generate all possible combinations of columns in a pandas dataframe with many columns?

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related