2

My task is to drop all rows containing NaNs and encode all the categorical variables inside of data.

I wrote a function that looks like

def preprocess_data(data):

    data = data.dropna()
    le = LabelEncoder()
    data['car name'] = le.fit_transform(data['car name'])

    return data

which takes a dataframe and returns a processed data. Running this function gives me a warning that says:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I don't quite get which part of my code is causing this and how to fix it.

1
  • What are you passing to preprocess_data? Commented Jan 19, 2018 at 21:32

2 Answers 2

1

Make sure you tell pandas that data it is its own data frame (and not a slice) by using:

def preprocess_data(data):

    data = data.dropna().copy()
    le = LabelEncoder()
    data['car name'] = le.fit_transform(data['car name'])

    return data

A more detailed explanation here: https://github.com/pandas-dev/pandas/issues/17476

Sign up to request clarification or add additional context in comments.

Comments

0

Maybe you should give more information and / or the problem is not in the method. The following code does not produce warning.

def preprocess_data(data):

    data = data.dropna()
    le = preprocessing.LabelEncoder()
    data['car name'] = le.fit_transform(data['car name'])
    return data


preprocess_data(pd.DataFrame({'car name': ['nissan', 'dacia'], 'car mode': ['juke', 'logan']}))

#   car mode  car name
# 0     juke         1
# 1    logan         0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.