1

I want to convert a particular categorical variable into dummy variables using pd.get_dummies() for both test and train data so instead of doing it for both separately, I used a for loop. However, the following code does not work and .head() returns the same dataset.

combine = [train_data, test_data]
for dataset in combine:
    dummy_col = pd.get_dummies(dataset['targeted_sex'])
    dataset = pd.concat([dataset, dummy_col], axis = 1)
    dataset.drop('targeted_sex', axis = 1, inplace = True)

train_data.head() # does not change

Even if I use an iterator which traverses the index like this, it still doesn't work.

for i in range(len(combine)):

Can I get some help? Also, Pandas get_dummies() doesn't provide an inplace option.

2 Answers 2

1

For referencing purposes , I would use a dict:

Create a dictionary of train and test:

combine={'train_data':train_data,'test_data':test_data}

Use this code which uses a dict comprehension:

new_combine={k:pd.concat([dataset, pd.get_dummies(dataset['targeted_sex'])], axis = 1)
                            .drop('targeted_sex',1) for k,dataset in combine.items()}

Print test and train now by referencing the keys:

print(new_combine['train_data']) #same for test
Sign up to request clarification or add additional context in comments.

Comments

0

You need to print dataset.head() instead of train_data.head().

You can use this function:

df: dataframe todummy_list: list of column names which will be dummies

def dummy_df(df, todummy_list):
    for x in todummy_list:
        dummies = pd.get_dummies(df[x], prefix=x, dummy_na=False)
        df = df.drop(x, 1)
        df = pd.concat([df, dummies], axis=1)
    return df

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.