8

Given the below DataFrame,

df = pd.DataFrame({'Student':['Siri','Alexa'], 'Class':['6', '7'], 'Section':['A','B'], 'Teacher':['Ravi','Mark'], 'School':['C','D']})

I would like to return a table with all possible combinations of the values of all columns. I achieved this, using the below code

df['key'] = 1

df1 = pd.merge(df.loc[:, ['key','Student']], df.loc[:,['key','Class']], how='outer')

df2 = df1.merge(df.loc[:,['key','Section']], how='outer')

df3 = df2.merge(df.loc[:,['key','Teacher']], how='outer')

df4 = df3.merge(df.loc[:,['key','School']], how='outer')

df4.drop(columns='key', inplace=True)

What is the simplest way to get this done, as I have 15 columns and through the above method, it would result in 14 merges and inefficient code?

1
  • This answer may help show you how to set it up as an loop Commented Apr 23, 2018 at 12:30

2 Answers 2

10

You meen like:

from itertools import product

df = pd.DataFrame({'Student':['Siri','Alexa'], 'Class':['6', '7'], 'Section':['A','B'], 'Teacher':['Ravi','Mark'], 'School':['C','D']})

uniques = [df[i].unique().tolist() for i in df.columns ]
pd.DataFrame(product(*uniques), columns = df.columns)

that results in a Cartesian product of all unique entries in each columns.

   Student Class Section Teacher School
0     Siri     6       A    Ravi      C
1     Siri     6       A    Ravi      D
2     Siri     6       A    Mark      C
3     Siri     6       A    Mark      D
4     Siri     6       B    Ravi      C
5     Siri     6       B    Ravi      D
6     Siri     6       B    Mark      C
7     Siri     6       B    Mark      D
8     Siri     7       A    Ravi      C
9     Siri     7       A    Ravi      D
10    Siri     7       A    Mark      C
11    Siri     7       A    Mark      D
12    Siri     7       B    Ravi      C
13    Siri     7       B    Ravi      D
14    Siri     7       B    Mark      C
15    Siri     7       B    Mark      D
16   Alexa     6       A    Ravi      C
17   Alexa     6       A    Ravi      D
18   Alexa     6       A    Mark      C
19   Alexa     6       A    Mark      D
20   Alexa     6       B    Ravi      C
21   Alexa     6       B    Ravi      D
22   Alexa     6       B    Mark      C
23   Alexa     6       B    Mark      D
24   Alexa     7       A    Ravi      C
25   Alexa     7       A    Ravi      D
26   Alexa     7       A    Mark      C
27   Alexa     7       A    Mark      D
28   Alexa     7       B    Ravi      C
29   Alexa     7       B    Ravi      D
30   Alexa     7       B    Mark      C
31   Alexa     7       B    Mark      D
Sign up to request clarification or add additional context in comments.

5 Comments

Nice solution, but what if one of the columns in df has more unique values than the other columns?
It will still work, he's building a list of lists, not a matrix, so they can have different lengths.
I got this error - raise TypeError("data argument can't be an iterator") TypeError: data argument can't be an iterator
@omer not if you use the code from my answer.
How would you change this to create larger combination sizes instead of 2?
4

You are looking for a Cartesian product. This is possible via itertools.product:

from itertools import product

prod = product(df['Class'].unique(), df['Section'].unique())
student_cols = [x for x in df.columns if x not in ('Class', 'Section')]
students = df[student_cols].drop_duplicates().values.tolist()

res = pd.DataFrame([s + list(p) for p in prod for s in students],
                   columns=list(student_cols+['Class', 'Section']))\
        .sort_values(list(student_cols+['Class', 'Section']))

print(res)

#   School Student Teacher Class Section
# 0      C    Siri    Ravi     6       A
# 2      C    Siri    Ravi     6       B
# 4      C    Siri    Ravi     7       A
# 6      C    Siri    Ravi     7       B
# 1      D   Alexa    Mark     6       A
# 3      D   Alexa    Mark     6       B
# 5      D   Alexa    Mark     7       A
# 7      D   Alexa    Mark     7       B

6 Comments

Thanks, this will work, but I have 15 columns in my DataFrame and hence, it would be a lengthy process even through this method, if at all I didn't get it wrong.
@NarendraGadidasu, In that case, you should edit your question to include, say, 5 columns - and mention you have 15 columns. Then people will be aware of this fact and we can account for this.
Edited the question. Thanks!
@NarendraGadidasu, So can you update your dataframe with 5 columns and also add desired output?
Added an example with 5 columns
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.