0

I am trying to implement Catboostregresor into my code first time in my life, so it kills me so far. I have come across with several errors and solve them. But this last one is there whatever I have tried so far.

At last, I deleted almost every feature from my dataset for debugging if it is about input set or not. There are several numerical columns named under num_cols; and also 1 categorical column(which is consisting of strings, not numbers etc.) named under cat_cols, only remaining columns after debugging. But error still persists.

class 'pandas.core.frame.DataFrame'
RangeIndex: 395 entries, 0 to 394
Data columns (total 5 columns):
T_CUST_TRI 395 non-null int32
TRIESTE_CNT 395 non-null int32
LANECNT 395 non-null int32
TRADELANE 395 non-null category
TIME_DUE 395 non-null int32
dtypes: category(1), int32(4)

I am consistently getting this error at the end. Thanks for your help and time:

File "C:\Continuum\anaconda3\lib\site-packages\sklearn\model_selection_search.py", line 650, in fit X, y, groups = indexable(X, y, groups)

*File "C:\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 248, in indexable check_consistent_length(result)

File "C:\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 208, in check_consistent_length lengths = [_num_samples(X) for X in arrays if X is not None]

File "C:\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 208, in listcomp
lengths = [_num_samples(X) for X in arrays if X is not None]

File "C:\Continuum\anaconda3\lib\site-packages\sklearn\utils\validation.py", line 152, in _num_samples
" a valid collection." % x)

TypeError: Singleton array array(catboost.core.Pool object at 0x0000025CF69CFD68, dtype=object) cannot be considered a valid collection.

if feature_selection == 1:

    models = dict()
    
    paramsrf = {
            'est__max_depth':[5, 9, 18, 32],
            'est__n_estimators': [10, 50, 100, 200],
            'est__min_samples_split': [0.1, 1.0, 2],
            'est__min_samples_leaf': [0.1, 0.5, 1]
            }
    
    paramscat = {
            'est__depth': np.linspace(4,10,4,endpoint=True),
            'est__iterations':[250,100,500,1000],
            'est__learning_rate':[0.001,0.01,0.1,0.3],
            'est__bagging_temperature': [0,5,10,25,50],
            'est__border_count':[5,10,20,50,100]
            }
    
    #models['rf'] = [RandomForestRegressor(), paramsrf]
    models['catb'] = [CatBoostRegressor(cat_features = cat_cols, verbose = 0), paramscat]
    
    for key, value in models.items():
                
        start_time = timeit.default_timer()
        
        scorer = ['neg_mean_squared_error', 'neg_mean_absolute_error', 'r2']
        
        if key == 'catb':
            
            preprocessor = ColumnTransformer(transformers = [('num', MinMaxScaler(feature_range = (0,1)), num_cols)])
            
            all_pipe = Pipeline(steps = [('prep', preprocessor), ('est', value[0])])
        
            search_space = value[1]
                        
            pooled = Pool(data = FeaturesData(
                                                num_feature_data = np.array(df_x[num_cols].values, dtype = np.float32), 
                                                cat_feature_data = np.array(df_x[cat_cols].values, dtype= object), 
                                                num_feature_names = num_cols, 
                                                cat_feature_names = cat_cols),
                         label =  np.array(df_y.values.ravel(), dtype = np.float32))
            
            grid_search = GridSearchCV(all_pipe, search_space, cv=5, verbose=1, refit = 'neg_mean_squared_error', scoring = scorer, return_train_score = True, n_jobs = -1)

            grid_search.fit(pooled)
            
1
  • Please format your error trace as code, not as bold text Commented Jun 19, 2020 at 15:34

1 Answer 1

0

This error could happen for a number of reasons. For instance,

  1. Variable definition masking your function declaration
  2. Passing a positional argument as a keyword argument
  3. if a column name in your data is the same as an attribute/method of the object containing the data.

I am inclined to think that your error is likely to do with the second point. Somewhere in your code, you may not need to define a kwarg. I would recommend you work through a trial and error method in which you can add/remove line of code to identify where the error is stemming from.

You can also look for solutions here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.