0

I'm studying for a Data Science Olympiad competition and i have ran into a little problem. All ive done is converted values in a row with values ranging 2-8 into good or bad using a bin, then i used the label encoder to make them 1 or 0

when running this code:

import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder

#load our data file
data = pd.read_csv("data.csv", delimiter=";")

#classify wines as good or bad
bins = (1,5,8)
group_names = ['bad', "good"]
data["quality"] = pd.cut(data["quality"], bins=bins, labels=group_names)
print(data["quality"].unique())

#list the labels as good or bad to 1 or 0
label_quality = LabelEncoder()
data["quality"] = label_quality.fit_transform(data["quality"])

#create our feature ad result sets
X = data.drop(data["quality"], axis=1)
y = data["quality"]

#create our training sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=10)

print(data.head(100))

i run into the error:

Traceback (most recent call last):
  File "main.py", line 21, in <module>    X = data.drop(data["quality"], axis=1)
  File "/home/runner/.local/share/virtualenvs/python3/lib/python3.8/site-packages/pandas/core/frame.py", line 3990, in drop    return super().drop(
  File "/home/runner/.local/share/virtualenvs/python3/lib/python3.8/site-packages/pandas/core/generic.py", line 3936, in drop    obj = obj._drop_axis(labels, axis, level=level, errors=errors)
  File "/home/runner/.local/share/virtualenvs/python3/lib/python3.8/site-packages/pandas/core/generic.py", line 3970, in _drop_axis    new_axis = axis.drop(labels, errors=errors)
  File "/home/runner/.local/share/virtualenvs/python3/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 5018, in drop    raise KeyError(f"{labels[mask]} not found in axis")
KeyError: '[0 0 0 ... 1 0 1] not found in axis'

it says my row values aren't found in the axis but i already specified axis one so shouldn't it cut it?

2
  • 2
    Check the syntax for drop() again. It takes the name of a column, not the full series ('quality' not data['quality']) Commented Feb 24, 2020 at 17:15
  • 1
    For the drop command, try X = data.drop(['quality'], axis=1) or X = data.drop(columns=['quality'], axis=1) Commented Feb 24, 2020 at 17:21

1 Answer 1

2

Actually there is a mistake in you python code , drop function takes columns names as a list not the column itself just try below code it should work fine

#create our feature ad result sets
y = data["quality"]
X = data.drop(["quality"], axis=1)

and one more thing before dropping you have to copy that column in y otherwise it will give error as column 'quality' has been dropped

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.