All possible combinations of columns in dataframe -pandas/python

Question

I'm trying to take one dataframe and create another, with all possible combinations of the columns and the difference between the corresponding values, i.e on 11-apr column AB should be (B-A)= 0 etc.

e.g, starting with

        Dt              A           B           C          D
        11-apr          1           1           1          1
        10-apr          2           3           1          2

how do I get a new frame that looks like this:

I have come across the below post, but have not been able to transpose this to work for columns.

Aggregate all dataframe row pair combinations using pandas

Any thoughts on how to do this for 3 columns, so let's say i want to do 2*B - A - C in the above example? — S.Peters
– S.Peters, Commented Jul 9, 2018 at 12:23

Michael Dorner · Accepted Answer · 2019-05-03 10:24:45Z

18

You can use:

from itertools import combinations
df = df.set_index('Dt')

cc = list(combinations(df.columns,2))
df = pd.concat([df[c[1]].sub(df[c[0]]) for c in cc], axis=1, keys=cc)
df.columns = df.columns.map(''.join)
print (df)
        AB  AC  AD  BC  BD  CD
Dt                            
11-apr   0   0   0   0   0   0
10-apr   1  -1   0  -2  -1   1

edited May 3, 2019 at 10:24

Michael Dorner

20.6k16 gold badges96 silver badges132 bronze badges

answered Apr 11, 2017 at 14:01

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

S.Peters Over a year ago

Thanks for this, works perfectly. Any thoughts on how to modify this for 3 combinations, e.g ABC, ABD, BCD etc and then rather than (B-A) having 2* B - C - A.

jezrael Over a year ago

do you think cc = list(combinations(df.columns,3)) ?

jezrael Over a year ago

and then df.columns = df.columns.map('-'.join) ?

S.Peters Over a year ago

I've got the list working no problem, but on pd.concat([df[c[2]].sub(df[c[1]]) I'm struggling to work in a third reference.

constiii Over a year ago

How can I do the same with all more variables (more combinations) AND add the numbers (or strings alternatively) instead of subtract them? e.g. A B C D E AB AC AD ..... ABCDE ? @jezrael

|

piRSquared · Accepted Answer · 2017-04-11 17:06:33Z

Make sure your index is Dt

df = df.set_index('Dt')

Using numpys np.tril_indices and slicing See below for explanation of np.triu_indices

v = df.values

i, j = np.tril_indices(len(df.columns), -1)

We can create a pd.MultiIndex for the columns. This makes it more generalizable for column names that are longer than one character.

pd.DataFrame(
    v[:, i] - v[:, j],
    df.index,
    [df.columns[j], df.columns[i]]
)

        A     B  A  B  C
        B  C  C  D  D  D
Dt                      
11-apr  0  0  0  0  0  0
10-apr  1 -1 -2  0 -1  1

But we can also do

pd.DataFrame(
    v[:, i] - v[:, j],
    df.index,
    df.columns[j] + df.columns[i]
)

        AB  AC  BC  AD  BD  CD
Dt                            
11-apr   0   0   0   0   0   0
10-apr   1  -1  -2   0  -1   1

np.tril_indices explained

This is a numpy function that returns two arrays that when used together, provide the locations of a lower triangle of a square matrix. This is handy when doing manipulations of all combinations of things as this lower triangle represents all combinations of one axis of a matrix with the other.

Consider the dataframe d for illustration

d = pd.DataFrame(np.array(list('abcdefghijklmnopqrstuvwxy')).reshape(-1, 5))
d

   0  1  2  3  4
0  a  b  c  d  e
1  f  g  h  i  j
2  k  l  m  n  o
3  p  q  r  s  t
4  u  v  w  x  y

The triangle indices, when looked at like coordinate pairs, looks like this

i, j = np.tril_indices(5, -1)
list(zip(i, j))

[(1, 0),
 (2, 0),
 (2, 1),
 (3, 0),
 (3, 1),
 (3, 2),
 (4, 0),
 (4, 1),
 (4, 2),
 (4, 3)]

I can manipulate ds values with i and j

d.values[i, j] = 'z'
d

   0  1  2  3  4
0  a  b  c  d  e
1  z  g  h  i  j
2  z  z  m  n  o
3  z  z  z  s  t
4  z  z  z  z  y

And you can see it targeted just that lower triangle

naive time test

languitar · Accepted Answer · 2017-04-11 14:00:34Z

1

itertools.combinations will help you:

import itertools
pd.DataFrame({'{}{}'.format(a, b): df[a] - df[b] for a, b in itertools.combinations(df.columns, 2)})

Which results in:

        AB  AC  AD  BC  BD  CD
Dt                            
11-apr   0   0   0   0   0   0
10-apr  -1   1   0   2   1  -1

answered Apr 11, 2017 at 14:00

languitar

6,8342 gold badges42 silver badges66 bronze badges

1 Comment

user11186769 Over a year ago

This one works well if you have additional conditions such as df = pd.DataFrame({'{}{}'.format(a, b): df[a] & df[b] for a, b in itertools.combinations(df.columns, 2) if (df[a] & df[b]).any() }). The column labels won't get messed up like the previous answers.

Nipun Batra · Accepted Answer · 2017-04-11 14:07:25Z

1

Itertools module should help you to create the required combinations/permutations.

from itertools import combinations

# Creating a new pd.DataFrame
new_df = pd.DataFrame(index=df.index)

# list of columns
columns = df.columns

# Create all combinations of length 2 . eg. AB, BC, etc.
for combination in combinations(columns, 2):
    combination_string = "".join(combination)
    new_df[combination_string] = df[combination[1]]-df[combination[0]]
    print new_df


         AB  AC  AD  BC  BD  CD
Dt                            
11-apr   0   0   0   0   0   0
10-apr   1  -1   0  -2  -1   1

edited Apr 11, 2017 at 14:07

answered Apr 11, 2017 at 13:58

Nipun Batra

11.4k13 gold badges55 silver badges77 bronze badges

1 Comment

veg2020 Over a year ago

Although slower than Languitar's answer from above, this is much more readable. Thank you @Nipun for your excellent answer.

Collectives™ on Stack Overflow

All possible combinations of columns in dataframe -pandas/python

4 Answers 4

7 Comments

Comments

1 Comment

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

7 Comments

Comments

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related