Creating all possible combinations of rows in dataframe in python

Question

I have a dataset like the following :

   Survived  PassengerId  Pclass  
    1.0          1.0       1.0
    20.0        179.0      1.5
    39.0        357.0      2.0
    58.0        535.0      2.5
    77.0        713.0      NaN
    96.0         NaN       NaN
    NaN          NaN       NaN
    NaN          NaN       NaN
    NaN          NaN       NaN
    NaN          NaN       NaN

And i want to create all possible combinations of these row wise. Something like :

   Survived  PassengerId  Pclass  
    1.0          1.0       1.0
    1.0          1.0       1.5
    1.0          1.0       2.0
    1.0          1.0       2.5
    20.0        179.0      1.0
    20.0        179.0      1.5
    20.0        179.0      2.0
    20.0        179.0      2.5
      .           .         .  
      .           .         .
      .           .         .
    1.0         713.0      2.5
    20.0        713.0      2.5
    39.0        713.0      2.5
    58.0        713.0      2.5
    77.0        713.0      2.5
    96.0        713.0      2.5

Since there are 6, 5 and 4 unique observations in each column so the new dataframe will have 6*5*4 = 120 rows.
There is a similar function in R called expand.grid however, need it purely in python. Does anyone have a similar function written in Python? Thank you.

indra.firmansyah · Accepted Answer · 2020-02-02 11:17:51Z

3

You can do this using library Pandas;

from itertools import product
import pandas as pd

Survived = [1.0, 20.0, 39.0, 58.0, 77.0, 96.0]
PassengerId = [1.0, 179.0, 357.0, 535.0, 713.0]
Pclass = [1.0, 1.5, 2.0, 2.5]

result = pd.DataFrame(product(Survived, PassengerId, Pclass), columns=['Survived', 'PassengerId', 'Pclass'])

Once you get the variable result, you should be getting a dataframe with length 120;

>>> len(result) # this prints the length of the dataframe
120
>>> result.head() # this shows the first 5 records
   Survived  PassengerId  Pclass
0       1.0          1.0     1.0
1       1.0          1.0     1.5
2       1.0          1.0     2.0
3       1.0          1.0     2.5
4       1.0        179.0     1.0

answered Feb 2, 2020 at 11:17

indra.firmansyah

412 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Sarvesh Singh Over a year ago

This answers my question to an extent. The problem I am facing further is that these columns need to be dynamic, hence, I have this code : keys_all = tuple(generated_var_dict.keys()) values_all = tuple(generated_var_dict.values()) from itertools import product result = pd.DataFrame(product(values_all), columns=keys_all) However, it gives this error for the values_all : AssertionError: 28 columns passed, passed data had 1 columns

indra.firmansyah Over a year ago

You can try the following result = pd.DataFrame(product(*values_all), columns=keys_all) You have to add the apostrophe (*) before values_all

indra.firmansyah Over a year ago

Hi @SarveshSingh , if the my answer helps your problem, please help me to accept the answer, thank you. If you still have a problem, please let me know. I will see whether I can answer.

Collectives™ on Stack Overflow

Creating all possible combinations of rows in dataframe in python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related