3

I have the following pandas dataframe df with 10 rows and 4 columns that attributes 3 categorical variables:

df = pd.DataFrame(np.random.choice(["dog", "cat", "mice"], size=(10, 4)))

I would to know all permutations possible between the rows and create a new dataframe containing different groupings of the row combinations such as a group containing twice the same variable in the same row as cat cat dog mice or 4 of the same pig pig pig pig etc. I have tried with Itertools without success. Someone to help with some indications? Thanks

2
  • 2
    It will be easier to answer if you have an expected output. Commented Apr 26, 2021 at 8:57
  • The output would a table with multiple rows and 2 columns: first column will contain the different groups that can contain identical rows or pairs or unique combinations, the second column would contain the count of the group seen (frequency) Commented Apr 26, 2021 at 9:11

1 Answer 1

1

I hope I've understood your question right. This example will create Series where index is the combination and values are size of this combination:

from collections import Counter
from itertools import permutations

print(
    df.assign(
        items=df.apply(
            lambda x: [
                frozenset(Counter(p).items()) for p in permutations(x, len(x))
            ],
            axis=1,
        )
    )
    .explode("items")
    .groupby("items")
    .size()
)

Prints (for example):

items
((mice, 2), (dog, 2))              48
((cat, 1), (dog, 2), (mice, 1))    48
((cat, 3), (mice, 1))              24
((mice, 3), (cat, 1))              24
((dog, 1), (mice, 3))              48
((dog, 1), (cat, 2), (mice, 1))    24
((mice, 4))                        24
dtype: int64

EDIT: To get a dataframe:

x = (
    df.assign(
        items=df.apply(
            lambda x: [
                frozenset(Counter(p).items()) for p in permutations(x, len(x))
            ],
            axis=1,
        )
    )
    .explode("items")
    .groupby("items")
    .size()
)
df_out = (
    pd.DataFrame([dict(i, count=v) for i, v in zip(x.index, x)])
    .fillna(0)
    .astype(int)
)
print(df_out)

Prints:

   dog  mice  cat  count
0    1     1    2     24
1    2     2    0     72
2    2     1    1     24
3    0     2    2     48
4    4     0    0     24
5    0     3    1     24
6    1     3    0     24
Sign up to request clarification or add additional context in comments.

3 Comments

Great, I think it is! Thanks Andrej!! Do you know how I could create a new df from the output so I could plot some combination of the items?
Much appreciated! Thanks a lot of your kind help!!
Is it possible to count the number of unique combinations from the initial df?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.