1

I am working on following data frames, though original data frames are quite large with thousands of lines, for illustration purpose I am using much basic df.

My first df is the following :

        ID      value
    0   3       7387
    1   8       4784
    2   11      675
    3   21      900

And there is another huge df, say df2

        x            y          final_id
    0   -7.35        2.09       3
    1   -6.00        2.76       3
    2   -5.89        1.90       4
    3   -4.56        2.67       5
    4   -3.46        1.34       8
    5   -4.67        1.23       8
    6   -1.99        3.44       8
    7   -5.67        2.40       11
    8   -7.56        1.66       11
    9   -9.00        3.12       21
    10  -8.01        3.11       21 
    11  -7.90        3.19       22

Now, from the first df, I want to consider only "ID" column and match it's values to the "final_id" column in the second data frame(df2).

I want to create another df which contains only the filtered rows of df2, ie only the rows which contains "final_id" as 3, 8, 11, 21 (as per the "ID" column of df1).

Below would the resultant df:

         x            y         final_id
    0   -7.35        2.09       3
    1   -6.00        2.76       3
    2   -3.46        1.34       8
    3   -4.67        1.23       8
    4   -1.99        3.44       8
    5   -5.67        2.40       11
    6   -7.56        1.66       11
    7   -9.00        3.12       21
    8   -8.01        3.11       21

We can see rows 2, 3, 11 from df2 has been removed from resultant df.

Please help.

1 Answer 1

2

You can use isin to create a mask and then use the boolean mask to subset your df2:

mask = df2["final_id"].isin(df["ID"])
print(df2[mask])

        x      y    final_id
0   -7.35   2.09    3
1   -6.00   2.76    3
4   -3.46   1.34    8
5   -4.67   1.23    8
6   -1.99   3.44    8
7   -5.67   2.40    11
8   -7.56   1.66    11
9   -9.00   3.12    21
10  -8.01   3.11    21
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks @pansen . This is what I required.
Though instead of printing, I am creating a new df with the following: mask = df2["final_df"].isin(df["ID"]) new_df = pd.DataFrame(df2[mask]) new_df.head()
@Liza You can simplify it to cluster5 = df2[mask]. You don't need to call the dataframe constructor.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.