1

I'm trying to classify multiple text features to a status. The data includes messages (errors and warnings) from different servers with the components and will result in different states. For example:

ServerName     Name     Description                               Severity   State
-------------- -------- ----------------------------------------- ---------- -------------
QWERT-XY-123   MySQL    Service not available on target machine   error      important
QWERT-XY-146   Oracle   Service caused an error                   warning    unimportant
...    

This is a part of the vectorizing:

from sklearn.feature_extraction.text import HashingVectorizer

vectorizer = HashingVectorizer()

X_Servername = df["ServerName"].values
X_Name = df["Name"].values
X_Description = df["Description"].values
X_Severity = df["Severity"].values
y = df["State"].values

X_Servername = vectorizer.transform(X_Servername)
X_Name = vectorizer.transform(X_Name)
X_Description = vectorizer.transform(X_Description)

features=list(zip(X_Servername,X_Name,X_Description,X_Severity))

Now I want to fit the Model:

from sklearn.svm import SVC

model = SVC(kernel = "linear", probability=True)
model.fit(features, y)

And the result is the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-183-71455dd49f0b> in <module>()
  2 
  3 model = SVC(kernel = "linear", probability=True)
----> 4 model.fit(features, y)
  5 
  6 #print(model.score(X_test, y))

D:\Enviroment\Anaconda3\lib\site-packages\sklearn\svm\base.py in fit(self, X, y, sample_weight)
147         self._sparse = sparse and not callable(self.kernel)
148 
149 -->     X, y = check_X_y(X, y, dtype=np.float64, order='C', accept_sparse='csr')
150         y = self._validate_targets(y)
151 

D:\Enviroment\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, warn_on_dtype, estimator)
571     X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
572                     ensure_2d, allow_nd, ensure_min_samples,
573 -->                 ensure_min_features, warn_on_dtype, estimator)
574     if multi_output:
575         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

D:\Enviroment\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431                                       force_all_finite)
432     else:
433 -->     array = np.array(array, dtype=dtype, order=order, copy=copy)
434 
435         if ensure_2d:

ValueError: setting an array element with a sequence.

So my question is about how to use multiple features with the hashingvectorizer or is the only way putting all features into one line?

Thanks for your help.

Update

The failer is on how to build the vectorized feature list. Instead of:

features=list(zip(X_Servername,X_Name,X_Description,X_Severity))

I now uses this function where extracted appends all created vectorized values (X_ServerName, X_Name, ...):

def combine(extracted):
    if any(sparse.issparse(fea) for fea in extracted):
        stacked = sparse.hstack(extracted).tocsr()
        stacked = stacked.toarray()
    else:
        stacked = np.hstack(extracted)

    return stacked
3
  • You never fit your vectorizer before you attempt to transform your data. I'm guessing your output isn't what you think it is before you try to fit the SVC Commented Feb 19, 2019 at 17:03
  • Hi @G.Anderson thanks for your reply. I fit the vectorizer with fit_transform but there is still the same error Commented Feb 19, 2019 at 17:15
  • Possible duplicate of ValueError: setting an array element with a sequence. while using SVM in scikit-learn Commented Feb 19, 2019 at 17:49

1 Answer 1

0

Please try the code below:

from sklearn_pandas import DataFrameMapper, gen_features
from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.preprocessing import LabelEncoder

cat_features = ["ServerName", "Name", "Description", "Severity"]
gf = gen_features(cat_features, [HashingVectorizer])
mapper = DataFrameMapper(gf)
cat_features_transformed = mapper.fit_transform(df)

target_name_encoded = LabelEncoder().fit_transform(df["State"])

from sklearn.svm import SVC

model = SVC(kernel = "linear", probability=True)
model.fit(cat_features_transformed, target_name_encoded)
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
  kernel='linear', max_iter=-1, probability=True, random_state=None,
  shrinking=True, tol=0.001, verbose=False)

### For test/prediction part ###

test_features_transformed = mapper.transform(df_test)
predictions = model.predict(test_features_transformed)

Note, you may need to run

pip install sklearn-pandas

if you do not have sklearn-pandas installed on your machine.

The aforementioned solution will allow you (1) transform your data to suitable format and later (2) apply the same fitted transformations to your test data via transform method.

Please let us know if this helps

Sign up to request clarification or add additional context in comments.

4 Comments

Is there an advantage of using sklearn-pandas to building a solution based on column transformer or feature union and incorporating these into a pipeline?
Seems to solve my problem. The model can be fit. I will test it tomorrow :-)
@KRKirov DataFrameMapper and ColumnTransformer are basically the same, the code of using gen_features is knitter. But you always can achieve the same by writing the sequence of transformations explicitly.
@SergeyBushmanov, thanks for the response. Pardon me for saying this, but I find the solution based on sklearn-pandas somewhat untidy. It would have probably been easier to read a solution based on a pipeline using the standard sklearn transformers.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.