Creating dummy variables using pd.get_dummies in a for loop in Python

Question

I want to convert a particular categorical variable into dummy variables using pd.get_dummies() for both test and train data so instead of doing it for both separately, I used a for loop. However, the following code does not work and .head() returns the same dataset.

combine = [train_data, test_data]
for dataset in combine:
    dummy_col = pd.get_dummies(dataset['targeted_sex'])
    dataset = pd.concat([dataset, dummy_col], axis = 1)
    dataset.drop('targeted_sex', axis = 1, inplace = True)

train_data.head() # does not change

Even if I use an iterator which traverses the index like this, it still doesn't work.

for i in range(len(combine)):

Can I get some help? Also, Pandas get_dummies() doesn't provide an inplace option.

anky · Accepted Answer · 2019-12-11 15:33:25Z

1

For referencing purposes , I would use a dict:

Create a dictionary of train and test:

combine={'train_data':train_data,'test_data':test_data}

Use this code which uses a dict comprehension:

new_combine={k:pd.concat([dataset, pd.get_dummies(dataset['targeted_sex'])], axis = 1)
                            .drop('targeted_sex',1) for k,dataset in combine.items()}

Print test and train now by referencing the keys:

print(new_combine['train_data']) #same for test

answered Dec 11, 2019 at 15:33

anky

75.3k11 gold badges46 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

talatccan · Accepted Answer · 2019-12-11 15:00:28Z

0

You need to print dataset.head() instead of train_data.head().

You can use this function:

df: dataframe todummy_list: list of column names which will be dummies

def dummy_df(df, todummy_list):
    for x in todummy_list:
        dummies = pd.get_dummies(df[x], prefix=x, dummy_na=False)
        df = df.drop(x, 1)
        df = pd.concat([df, dummies], axis=1)
    return df

answered Dec 11, 2019 at 15:00

talatccan

7415 silver badges20 bronze badges

Collectives™ on Stack Overflow

Creating dummy variables using pd.get_dummies in a for loop in Python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related