0

I am trying to get the top 5 features for my dataframe df with X_train and y_train.

bestfeatures = SelectKBest(score_func=chi2, k=5) #k=5 means select top 5 features
fit = bestfeatures.fit(X_train,y_train)
dfscores = pd.DataFrame(fit.scores_)
dfcolumns = pd.DataFrame(X_train.columns)
#concat two dataframes for better visualization 
featureScores = pd.concat([dfcolumns,dfscores],axis=1)
featureScores.columns = ['Features','Score']  #naming the dataframe columns
print(featureScores.nlargest(5,'Score'))  #print 5best features

Error

ValueError                                Traceback (most recent call last)
<ipython-input-54-47286ab0e6e9> in <module>
      6 
      7 bestfeatures = SelectKBest(score_func=chi2, k=5)
----> 8 fit = bestfeatures.fit(X_train,y_train)
   
    ValueError: Unknown label type: (array([23.5, 35, 38.......
   .......]),)      

P.S. My Y_train is 23.5 , 35, 38 and so on... as in valueerror

How to solve this?

1 Answer 1

1

Your score function is chi2 so you are doing classification, not regression. You must therefore pass values in a finite space (such as: string, integer, etc.); floats can only be used for regression.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.