1

I have a training set and test set for machine learning, however the training set contains too many rows of data and the test set contains too little. I calculated I need to move 245 rows from the training set to the test set to produce a better split. How can I do this? I have 5116 total rows in training set.

First I randomized the rows of the training set using this

train_df = train_df.sample(n = len(train_df)).reset_index(drop=True)

And then I wanted to grab the last 245 rows and move them to test_df

I found these two solutions here

Pandas dataframe - move rows from one dataframe to another

and

Pandas move rows from 1 DF to another DF

However they are selecting the rows based on a condition which I don't have. I kind of want to do it like you would in python using slice on arrays if that's possible.

Maybe like (rows 0-5116 - 245 and all columns starting from 0)

transferdata_df = train_df.iloc[5115 - 245:, 0:]

Then append that to the test set like

test_df.append(transferdata_df)

I'm not sure if this is the correct way or not.

1 Answer 1

3

Let us do

transferdata_df = train_df.iloc[- 245:, 0:]

test_df = test_df.append(transferdata_df)

train_df =train_df.drop(transferdata_df.index)
Sign up to request clarification or add additional context in comments.

2 Comments

thanks, its working, also added test_df = test_df.reset_index(drop=True) at the end to reset the index in test dataframe.
@erotavlas yw :-) happy holiday

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.