3

I am trying to make a pipeline using the Column Transformer with Mixed Types example from scikit-learn page in my dataset but I am getting the error : ValueError: could not convert string to float: 'Male'.

from sklearn.compose import ColumnTransformer, make_column_transformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.pipeline import make_pipeline
from sklearn.compose import make_column_selector as selector
from sklearn.linear_model import LogisticRegression

numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
    ])

categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='most_frequent')),
    ('onehot',  OneHotEncoder())
    ])


preprocessor = ColumnTransformer(transformers=[
    ('num', numeric_transformer, selector(dtype_exclude=["category",'object'])),
    ('cat', categorical_transformer, selector(dtype_include=["category",'object']))
])

X = train.drop(['Loan_Status', 'Loan_ID'], axis=1)
y = train['Loan_Status']


x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=101)

pipeline = Pipeline(steps = [('preprocessor', preprocessor),
                    ('classifier',LogisticRegression())
                  ])

pipeline.fit(x_train, y_train)
score = clf.score(x_test, y_test)

I have read other related posts with the same error but all other occur in the fit mine occurs in the score evaluation.

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-19-df09dd400283> in <module>()
      4 
      5 pipeline.fit(x_train, y_train)
----> 6 score = clf.score(x_test, y_test)

4 frames

/usr/local/lib/python3.6/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
     83 
     84     """
---> 85     return array(a, dtype, copy=False, order=order)
     86 
     87 

ValueError: could not convert string to float: 'Male'

This is an overview of the dataset dtypes :

Loan_ID               object
Gender                object
Married               object
Dependents            object
Education             object
Self_Employed         object
ApplicantIncome        int64
CoapplicantIncome    float64
LoanAmount           float64
Loan_Amount_Term     float64
Credit_History       float64
Property_Area         object
Loan_Status           object
dtype: object

Also the first few rows of the dataset : enter image description here

4
  • This is surely a dataset issue, you need a numeric conversion, also it helps to post first few rows of the dataset. Commented May 10, 2020 at 22:26
  • @ZabirAlNazi How can I post the first few rows of the dataset? Commented May 11, 2020 at 9:38
  • Use, train.head(10), and post the output here. Commented May 11, 2020 at 9:54
  • @ZabirAlNazi I have added the dataset first rows. Commented May 11, 2020 at 10:13

1 Answer 1

1

Change

preprocessor = ColumnTransformer(transformers=[
    ('num', numeric_transformer, selector(dtype_exclude=["category",'object'])),
    ('cat', categorical_transformer, selector(dtype_include=["category",'object']))
])

to

preprocessor = ColumnTransformer(transformers=[
    ('num', numeric_transformer, selector(dtype_exclude=object)),
    ('cat', categorical_transformer, selector(dtype_include=object))
])

Reference: https://scikit-learn.org/stable/modules/generated/sklearn.compose.make_column_selector.html

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.