How to fill pandas dataframe columns in for loop

Question

I'm trying to fill pandas dataframe columns in a for loop. The column name is parametric and assigned by loop value. This is my code:

for k in range (-1, -4, -1):
    df_orj = pd.read_csv('something.csv', sep= '\t') 

    df_train = df_orj.head(11900)   
    df_test = df_orj.tail(720) 

    SHIFT = k

    df_train.trend = df_train.trend.shift(SHIFT)
    df_train = df_train.dropna()
    df_test.trend = df_test.trend.shift(SHIFT)
    df_test = df_test.dropna()

    drop_list = some_list

    df_out = df_test[['date','price']]
    df_out.index = np.arange(0, len(df_out)) # start index from 0
    df_out["pred-1"] = np.nan
    df_out["pred-2"] = np.nan
    df_out["pred-3"] = np.nan

    df_train.drop(drop_list, 1, inplace = True )
    df_test.drop(drop_list, 1, inplace = True )

    # some processes here

    rf = RandomForestClassifier(n_estimators = 10)
    rf.fit(X_train,y_train)
    y_pred = rf.predict(X_test)
    print("accuracy score: " , rf.score(X_test, y_test))


    X_test2 = sc.transform(df_test.drop('trend', axis=1))
    y_test2 = df_test['trend'].values

    y_pred2  = rf.predict(X_test2)
    print("accuracy score: ",rf.score(X_test2, y_test2))


    name = "pred{0}".format(k)
    for i in range (0, y_test2.size):
        df_out[name][i] = y_pred2[i]

df_out.head(20)

And this is my output:

                time_period_start  price_open  pred-1  pred-2  pred-3
697  2018-10-02T02:00:00.0000000Z       86.80     NaN     NaN     1.0
698  2018-10-02T03:00:00.0000000Z       86.65     NaN     NaN     1.0
699  2018-10-02T04:00:00.0000000Z       86.32     NaN     NaN     1.0

As you can see, only pred-3 is filled. How can I fill all 3 pre-defined columns?

You’re re-initializing those columns to go null in your for loop. Move your df_out[“pred-1”] = np.nan to before your for loop — chitown88
– chitown88, Commented Dec 10, 2018 at 21:45
@chitown88 Oh, that's silly of me. Since I re-initialize the columns, I lose first 2 column information. Can you post the correct code as an answer so that I can accept it? — iso_9001_
– iso_9001_, Commented Dec 10, 2018 at 21:48
Yup. No worries. Easy brain fart...happens all the time. I think we all do that at one time or another. I can guarantee I’ll make that mistake again too in the future. — chitown88
– chitown88, Commented Dec 10, 2018 at 22:00

Vikika · Accepted Answer · 2018-12-10 21:45:02Z

2

If i am understanding correctly, then your issue is that you are getting pred-3 filled only where as other two are nan. It's because your df_out is in the loop and you are getting the results for last iteration of loop. You should define it outside the loop so that you information won't get lost for the other two.

answered Dec 10, 2018 at 21:45

Vikika

3181 silver badge9 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

iso_9001_ Over a year ago

Thank you, your answer is the same with @chitown88's.

chitown88 · Accepted Answer · 2018-12-10 21:55:20Z

Your setting those 3 columns as nulls in each loop, so you’re losing those values as it iterates. Either move those initializing columns to before the loop, or you could just initialize with variables with:

Change out

df_out["pred-1"] = np.nan
df_out["pred-2"] = np.nan
df_out["pred-3"] = np.nan

To just initialize the individual column as it loops

name = "pred{0}".format(k)
df_out[name] = np.nan

So full code:

for k in range (-1, -4, -1):
    df_orj = pd.read_csv('something.csv', sep= '\t') 

    df_train = df_orj.head(11900)   
    df_test = df_orj.tail(720) 

    SHIFT = k

    df_train.trend = df_train.trend.shift(SHIFT)
    df_train = df_train.dropna()
    df_test.trend = df_test.trend.shift(SHIFT)
    df_test = df_test.dropna()

    drop_list = some_list

    df_out = df_test[['date','price']]
    df_out.index = np.arange(0, len(df_out)) # start index from 0

    name = "pred{0}".format(k)
    df_out[name] = np.nan

    df_train.drop(drop_list, 1, inplace = True )
    df_test.drop(drop_list, 1, inplace = True )

    # some processes here

    rf = RandomForestClassifier(n_estimators = 10)
    rf.fit(X_train,y_train)
    y_pred = rf.predict(X_test)
    print("accuracy score: " , rf.score(X_test, y_test))


    X_test2 = sc.transform(df_test.drop('trend', axis=1))
    y_test2 = df_test['trend'].values

    y_pred2  = rf.predict(X_test2)
    print("accuracy score: ",rf.score(X_test2, y_test2))



    for i in range (0, y_test2.size):
        df_out[name][i] = y_pred2[i]

df_out.head(20)

Collectives™ on Stack Overflow

How to fill pandas dataframe columns in for loop

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related