Pandas dataframe: how to permute rows and create new groups of combinations

Question

I have the following pandas dataframe df with 10 rows and 4 columns that attributes 3 categorical variables:

df = pd.DataFrame(np.random.choice(["dog", "cat", "mice"], size=(10, 4)))

I would to know all permutations possible between the rows and create a new dataframe containing different groupings of the row combinations such as a group containing twice the same variable in the same row as cat cat dog mice or 4 of the same pig pig pig pig etc. I have tried with Itertools without success. Someone to help with some indications? Thanks

The output would a table with multiple rows and 2 columns: first column will contain the different groups that can contain identical rows or pairs or unique combinations, the second column would contain the count of the group seen (frequency) — Jess BR
– Jess BR, Commented Apr 26, 2021 at 9:11

Andrej Kesely · Accepted Answer · 2021-04-26 10:41:51Z

1

I hope I've understood your question right. This example will create Series where index is the combination and values are size of this combination:

from collections import Counter
from itertools import permutations

print(
    df.assign(
        items=df.apply(
            lambda x: [
                frozenset(Counter(p).items()) for p in permutations(x, len(x))
            ],
            axis=1,
        )
    )
    .explode("items")
    .groupby("items")
    .size()
)

Prints (for example):

items
((mice, 2), (dog, 2))              48
((cat, 1), (dog, 2), (mice, 1))    48
((cat, 3), (mice, 1))              24
((mice, 3), (cat, 1))              24
((dog, 1), (mice, 3))              48
((dog, 1), (cat, 2), (mice, 1))    24
((mice, 4))                        24
dtype: int64

EDIT: To get a dataframe:

x = (
    df.assign(
        items=df.apply(
            lambda x: [
                frozenset(Counter(p).items()) for p in permutations(x, len(x))
            ],
            axis=1,
        )
    )
    .explode("items")
    .groupby("items")
    .size()
)
df_out = (
    pd.DataFrame([dict(i, count=v) for i, v in zip(x.index, x)])
    .fillna(0)
    .astype(int)
)
print(df_out)

Prints:

   dog  mice  cat  count
0    1     1    2     24
1    2     2    0     72
2    2     1    1     24
3    0     2    2     48
4    4     0    0     24
5    0     3    1     24
6    1     3    0     24

edited Apr 26, 2021 at 10:41

answered Apr 26, 2021 at 9:57

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Jess BR Over a year ago

Great, I think it is! Thanks Andrej!! Do you know how I could create a new df from the output so I could plot some combination of the items?

Jess BR Over a year ago

Much appreciated! Thanks a lot of your kind help!!

Jess BR Over a year ago

Is it possible to count the number of unique combinations from the initial df?

Collectives™ on Stack Overflow

Pandas dataframe: how to permute rows and create new groups of combinations

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related