0

I am trying to run an SVM linear kernel using a generated dataset. My dataset has 5000 rows and 4 columns:

CL_scaled.head()[screenshot of data frame][1]

I split the data into 20% test and 80% training:

train, test = train_test_split(CL_scaled, test_size=0.2)

and get a shape of (4000,4) for train and (1000,4) for test

However, when I run the svm on the training and testing data, I get the following error:

svclassifier = SVC(kernel='linear', C = 5)
svclassifier.fit(train, test)
ValueError                                Traceback (most recent call last)
<ipython-input-81-4c4a7bdcbe85> in <module>

----> 1 svclassifier.fit(train, test)

~/anaconda3/lib/python3.7/site-packages/sklearn/svm/base.py in fit(self, X, y, sample_weight)
    144         X, y = check_X_y(X, y, dtype=np.float64,
    145                          order='C', accept_sparse='csr',
--> 146                          accept_large_sparse=False)
    147         y = self._validate_targets(y)
    148 

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
    722                         dtype=None)
    723     else:
--> 724         y = column_or_1d(y, warn=True)
    725         _assert_all_finite(y)
    726     if y_numeric and y.dtype.kind == 'O':

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/validation.py in column_or_1d(y, warn)
    758         return np.ravel(y)
    759 
--> 760     raise ValueError("bad input shape {0}".format(shape))
    761 
    762 

ValueError: bad input shape (1000, 4)

Can someone please let me know what is wrong with my code or data? Thanks in advance!

 train.head()
                 0         1             2            3 
    2004    1.619999    1.049560    1.470708    -1.323666
    1583    1.389370    -0.788002   -0.320337   -0.898712
    1898    -1.436903   0.994719    0.326256    0.495565
     892    1.419123    1.522091    1.378514    -1.731400
     4619   0.063095    1.527875    -1.285816   -0.823347

test.head()
            0           1           2         3
1118    -1.152435   -0.484851   -0.996602   1.617749
4347    -0.519430   -0.479388   1.483582    -0.413985
2220    -0.966766   -1.459475   -0.827581   0.849729
 204    1.759567    -0.113363   -1.618555   -1.383653
3578    0.329069    1.151323    -0.652328   1.666561


print(test.shape)
print(train.shape)
(1000, 4)
(4000, 4)
0

2 Answers 2

1

The Error is because of train, test = train_test_split(CL_scaled, test_size=0.2)

First thing you need to separate data and output variable and pass it into train_test_split.

# I am assuming your last column is output variable
train_test_split(CL_scaled[:-1], CL_scaled[-1], test_size=0.2).

And train_test_split splits your data into 4 parts X_train, X_test, y_train, y_test

Furthormore, svclassifier.fit takes parameter independent variables and output variable. So you need to pass X_train and y_train

So your code should be

X_train, X_test, y_train, y_test = train_test_split(CL_scaled[:-1], CL_scaled[-1], test_size=0.2)

svclassifier = SVC(kernel='linear', C = 5)

svclassifier.fit(X_train, y_train)

For more details refer documentation

Sign up to request clarification or add additional context in comments.

Comments

0

You are missing the basic concept of supervised machine learning.

In a classification problem you have features X and with them you want to predict a class Y. For example this can look like this:

X                    y
Height Weight        class
170    50            1
180    60            1
10     10            0

The idea for algorithms is that they have a training part (you go to the soccer training to train) and a test part (you test your skills on the field on the weekend).

Therefore your need to split your data, into training and test set.

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(CL_scaled[:-1], CL_scaled[-1], test_size=0.2)

CL_scaled[:-1] is your X, and CL_scalded[-1] is your Y.

Then you are using this to fit your classifier (training part):

svclassifier = SVC(kernel='linear', C = 5)
svclassifier.fit(X_train, y_train)

And then you can test it:

prediction = svcclassifier.predict(X_test, y_test)

This will return your prediction for your test part (y_predict) and you can measure it against your y_test.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.