I am trying to accomplish something I thought would be easy: Take three columns from my dataframe, use a label encoder to encode them, and simply replace the current values with the new values.
I have a dataframe that looks like this:
| Order_Num | Part_Num | Site | BUILD_ID |
| MO100161015 | PPT-100K39 | BALT | A001 |
| MO100203496 | MDF-925R36 | BALT | A001 |
| MO100203498 | PPT-825R34 | BALT | A001 |
| MO100244071 | MDF-323DCN | BALT | A001 |
| MO100244071 | MDF-888888 | BALT | A005 |
I am essentially trying to use sklearn's LabelEncoder() to switch my String variables to numeric. Currently, I have a function str_to_num where I feed it a column and it returns me an array (column) of the converted data. It works great.
However, I am struggling to remove the old data from my dataframe and add it to the new. My script is below:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
import pandas as pd
import numpy as np
# Convert the passed in column
def str_to_num(arr):
le = preprocessing.LabelEncoder()
array_of_parts = []
for x in arr:
array_of_parts.append(x)
new_arr = le.fit_transform(array_of_parts)
return new_arr
# read in data from csv
data = pd.read_csv('test.csv')
print(data)
# Create the new data
converted_column = str_to_num(data['Order_Num'])
print(converted_column)
# How can I replace data['Order_Num'] with the values in converted_column?
# Drop the old data
dropped = data.drop('Order_Num', axis=1)
# Add the new_data column to the place where the old data was?
Given my current script, how can I replace the values in the 'Order_Num' column with those in converted_column? I have tried [pandas.DataFrame.replace][1], but that replaces specific values, and I don't know how to map that to the returned data.
I would hope my expected data to be:
| Order_Num | Part_Num | Site | BUILD_ID |
| 0 | PPT-100K39 | BALT | A001 |
| 1 | MDF-925R36 | BALT | A001 |
| 2 | PPT-825R34 | BALT | A001 |
| 3 | MDF-323DCN | BALT | A001 |
| 3 | MDF-888888 | BALT | A005 |
My python --version returns
3.6.7