0

I am trying to make a data frame that can store variable coeff value after each iteration. I am able to plot the graph after each iteration. but when I tried to insert the value in the data frame after each iteration.

I am getting this error.

None of [Int64Index([ 3169, 3170, 3171, 3172, 3173, 3174, 3175, 3176, 3177,\n 3178,\n ...\n 31671, 31672, 31673, 31674, 31675, 31676, 31677, 31678, 31679,\n
31680],\n dtype='int64', length=28512)] are in the [columns]

This is the code I use:

from sklearn.model_selection import KFold

kf = KFold(n_splits=10)
cvlasso= Lasso(alpha=0.001)
count = 1

var = pd.DataFrame()


for train, _ in kf.split(X, Y):
    cvlasso.fit(X.iloc[train, :], Y.iloc[train])
    importances_index_desc = cvlasso.coef_
    feature_labels = list(X.columns.values)
    importance = pd.Series(importances_index_desc, feature_labels)
    plt.figure()
    plt.bar(feature_labels, importances_index_desc)
    plt.xticks(feature_labels, rotation='vertical')
    plt.ylabel('Importance')
    plt.xlabel('Features')
    plt.title('Fold {}'.format(count))
    count = count + 1
    var[train] = importances_index_desc

plt.show()

and one more thing there is a total of 33000 observations in my dataset but at the end of the loop, the train value is 28512? Does anyone know why train value is not 33000?

3 Answers 3

1

train is the list of index of train data returned from KFold. You put train as accessing column in var[train] that will cause the error because none of index value is a DataFrame column .

IMO, setting complicated value as index is not good idea, just use simple value as index, for example

var.loc[count] = importances_index_desc
count += 1
Sign up to request clarification or add additional context in comments.

Comments

0

Another solution could be using pandas.DataFrame.append(pandas.DataFrame):

important_index_desc = pd.DataFrame(important_index_desc)
var = var.append(important_index_desc)

Let me know if this helps!

4 Comments

and one more thing there is a total of 33000 observations in my dataset but at the end of the loop, the train value is 28512? anyone know why train value is not 33000 ?
its possible that duplicate rows are removed on .append() - try .append(ignore_index=True)
yes but the train value starts from 0 right and move all the way up to the number of observation in the dataset. i.e. 31681, but my query is why it is 28512
from the error cited above in your question - it looks like it starts from 3169. it should start from zero. I think kf.split() gives a single value - like for train in kf.split(X, Y)
0

Try the following.

Instead of,

var = pd.DataFrame()

Create a dataframe with heading

var = pd.DataFrame(columns=['impt_idx_desc'])

Then in the loop use the 'loc' function as,

var.loc[count] = [importances_index_desc]

where count is increased by +1 in the loop.

4 Comments

Hi, use train as access index here will cause an error. You can check this for more detail: pandas.pydata.org/pandas-docs/stable/reference/api/…
sorry, I overlooked the data type returned. Editing the answer now. Thanks!
still i am getting an error . Error = cannot set a row with mismatched columns
try var.loc[count] = [importances_index_desc]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.