0

I have a dataset like the following :

   Survived  PassengerId  Pclass  
    1.0          1.0       1.0
    20.0        179.0      1.5
    39.0        357.0      2.0
    58.0        535.0      2.5
    77.0        713.0      NaN
    96.0         NaN       NaN
    NaN          NaN       NaN
    NaN          NaN       NaN
    NaN          NaN       NaN
    NaN          NaN       NaN

And i want to create all possible combinations of these row wise. Something like :

   Survived  PassengerId  Pclass  
    1.0          1.0       1.0
    1.0          1.0       1.5
    1.0          1.0       2.0
    1.0          1.0       2.5
    20.0        179.0      1.0
    20.0        179.0      1.5
    20.0        179.0      2.0
    20.0        179.0      2.5
      .           .         .  
      .           .         .
      .           .         .
    1.0         713.0      2.5
    20.0        713.0      2.5
    39.0        713.0      2.5
    58.0        713.0      2.5
    77.0        713.0      2.5
    96.0        713.0      2.5

Since there are 6, 5 and 4 unique observations in each column so the new dataframe will have 6*5*4 = 120 rows.
There is a similar function in R called expand.grid however, need it purely in python. Does anyone have a similar function written in Python? Thank you.

1 Answer 1

3

You can do this using library Pandas;

from itertools import product
import pandas as pd

Survived = [1.0, 20.0, 39.0, 58.0, 77.0, 96.0]
PassengerId = [1.0, 179.0, 357.0, 535.0, 713.0]
Pclass = [1.0, 1.5, 2.0, 2.5]

result = pd.DataFrame(product(Survived, PassengerId, Pclass), columns=['Survived', 'PassengerId', 'Pclass'])

Once you get the variable result, you should be getting a dataframe with length 120;

>>> len(result) # this prints the length of the dataframe
120
>>> result.head() # this shows the first 5 records
   Survived  PassengerId  Pclass
0       1.0          1.0     1.0
1       1.0          1.0     1.5
2       1.0          1.0     2.0
3       1.0          1.0     2.5
4       1.0        179.0     1.0
Sign up to request clarification or add additional context in comments.

3 Comments

This answers my question to an extent. The problem I am facing further is that these columns need to be dynamic, hence, I have this code : keys_all = tuple(generated_var_dict.keys()) values_all = tuple(generated_var_dict.values()) from itertools import product result = pd.DataFrame(product(values_all), columns=keys_all) However, it gives this error for the values_all : AssertionError: 28 columns passed, passed data had 1 columns
You can try the following result = pd.DataFrame(product(*values_all), columns=keys_all) You have to add the apostrophe (*) before values_all
Hi @SarveshSingh , if the my answer helps your problem, please help me to accept the answer, thank you. If you still have a problem, please let me know. I will see whether I can answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.