2

I just learnt pandas and basically I want to take the some rows of a dataframe based on the ids that are stored in another dataframe. Let me show you the code:

import pandas as pd
from sklearn.model_selection import train_test_split

f_data="data.tsv"
all_data = pd.read_csv(f_data,delimiter='\t',encoding='utf-8',header=None)
x_data = all_data[[0,1,3]]
y_data = all_data[[2]]

# Split train and test sets
x_train,x_test,y_train,y_test = train_test_split(x_data,y_data,test_size=0.1)

all_data have 12 columns in total. I use 3 of the columns in x_data and 1 of them in y_data.

Once I create x_train and x_test, I would like to write these instances into tsv files but while doing that I want to write all of the 12 columns stored in all_data. To be able to do that, I need to match the instances in x_train and x_test with all_data. How could I do that ?

EDIT

Here how my data looks like:

all_data

        0                                                  1                              2    3   ...                                                8                      9     10    11
0       35  Auch in Großbritannien, wo 19 Atomreaktoren in...                       Ausstieg -1.0  ...                                      Sunday Times           Sunday Times   NaN     1

# continues like that

x_train

         0                                                  1    3
939   2074  Die CSU verlangt von der schwarz-gelben Koalit...  1.0

So, what I want to do is to get the rows starting with 939,710,288,854,433 in all_data and write them into a file.

3
  • 1
    Perhaps using all_data.loc[x_data.index]? You haven't shown us your data though. Commented Aug 5, 2018 at 10:19
  • @JohnZwinck I edited my question by adding some data. Commented Aug 5, 2018 at 10:22
  • @JohnZwinck your edited comment works. Thanks Commented Aug 5, 2018 at 10:27

1 Answer 1

1

The index of the split data corresponds to the original, and can be used to look up the original data (assuming the index is unique):

all_data.loc[x_train.index]
all_data.loc[x_test.index]
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.