I am reading data from csv to perform feature elimination. Here is how data look like
shift_id user_id status organization_id location_id department_id open_positions city zip role_id specialty_id latitude longitude years_of_experience
0 2 9 S 1 1 19 1 brooklyn 48001.0 2.0 9.0 42.643 -82.583 NaN
1 6 60 S 12 19 20 1 test 68410.0 3.0 7.0 40.608 -95.856 NaN
2 9 61 S 12 19 20 1 new york 48001.0 1.0 7.0 42.643 -82.583 NaN
3 10 60 S 12 19 20 1 test 68410.0 3.0 7.0 40.608 -95.856 NaN
4 21 3 S 1 1 19 1 pune 48001.0 1.0 2.0 46.753 -89.584 0.0
Here is my code -
dataset = pd.read_csv("data.csv",header = 0)
data = pd.read_csv("data.csv",header = 1)
target = dataset.location_id
#dataset.head()
svm = LinearSVC()
rfe = RFE(svm, 3)
rfe = rfe.fit(data, target)
print(rfe.support_)
print(rfe.ranking_)
But I am getting this error
ValueError: could not convert string to float: '1,141'
There is not string like this in my database.
There are some empty cell. So I tried to use -
result.fillna(0, inplace=True)
Which gave this error
ValueError: Expected 2D array, got scalar array instead:
array=None.
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Any suggestion how to preprocess this data correctly?
Here is link to sample data- https://gist.github.com/karimkhanp/6db4f9f9741a16e46fc294b8e2703dc7
(float('1,141'.replace(",","."))should do it1,141any where in my datacat prod_data_for_ML.csv | grep 141executed in the folder where your file is, assuming you're on Linux?1,141to be1141or1.141?