4

I am using xgboost in Python.

import pandas as pd
import numpy as np
import xgboost as xgb
from sklearn.model_selection import train_test_split

df=pd.read_csv('442.csv')
y=df.columnone
X=df.columnfive

X_train,X_test,Y_train,Y_test=train_test_split(X,y,test_size=0.2)
dtrain = xgb.DMatrix(X_train, label=Y_train)
dtest = xgb.DMatrix(X_test, label=Y_test)

The shape of the label seem to be uniform with the training set?

X_train.shape
>(405020,)
Y_train.shape
>(405020,)

param = {
   'eta': 0.3,
   'max_depth': 3, 
   'objective': 'multi:softprob', 
   'num_class': 2}
steps = 20  # The number of training iterations

But running this gives me this result:

model = xgb.train(param, dtrain, steps)
>XGBoostError: Check failed: labels_.Size() == num_row_ (405020 vs. 1) : Size of labels must equal to number of rows.

When I run

dtrain.num_row()
>1
dtrain.num_col()
>405020

This might have to do with the error? But I still have no idea how that could have happened. My initial X and y variables both have the correct number of rows and one column each.

3 Answers 3

3

Xgboost expects a 2-d array of inputs, and a vector of outputs. You are giving it two vectors, so it is confused. using df[["columnone"]] for the input should work.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. After changing that, though, I still get the same error. I changed it to y=df[["columnone"]] but still get the same result. The only difference is that this returns a different result: Y_train.shape >(405020,1) I added another bit of information I found in the original post, which I don't know is relevant or not.
0

I got the same error, but it was because I had a bug in my code

x = trainset[feature_cols + dummy_var_cols]
y = testset[[label_column]]

dtrain = xgb.DMatrix(x_with_dummies, y)

Can you spot it? My y variable was coming from my test set and not my train set!

Comments

0

For me this was an error based on the subset of my rows I was trying to test more quickly - this subset didn't have at least one of each label, so it was giving this error.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.