How to create new df based on columns of two different data frames?

Question

I am working on following data frames, though original data frames are quite large with thousands of lines, for illustration purpose I am using much basic df.

My first df is the following :

        ID      value
    0   3       7387
    1   8       4784
    2   11      675
    3   21      900

And there is another huge df, say df2

        x            y          final_id
    0   -7.35        2.09       3
    1   -6.00        2.76       3
    2   -5.89        1.90       4
    3   -4.56        2.67       5
    4   -3.46        1.34       8
    5   -4.67        1.23       8
    6   -1.99        3.44       8
    7   -5.67        2.40       11
    8   -7.56        1.66       11
    9   -9.00        3.12       21
    10  -8.01        3.11       21 
    11  -7.90        3.19       22

Now, from the first df, I want to consider only "ID" column and match it's values to the "final_id" column in the second data frame(df2).

I want to create another df which contains only the filtered rows of df2, ie only the rows which contains "final_id" as 3, 8, 11, 21 (as per the "ID" column of df1).

Below would the resultant df:

         x            y         final_id
    0   -7.35        2.09       3
    1   -6.00        2.76       3
    2   -3.46        1.34       8
    3   -4.67        1.23       8
    4   -1.99        3.44       8
    5   -5.67        2.40       11
    6   -7.56        1.66       11
    7   -9.00        3.12       21
    8   -8.01        3.11       21

We can see rows 2, 3, 11 from df2 has been removed from resultant df.

Please help.

pansen · Accepted Answer · 2017-04-06 17:10:35Z

2

You can use isin to create a mask and then use the boolean mask to subset your df2:

mask = df2["final_id"].isin(df["ID"])
print(df2[mask])

        x      y    final_id
0   -7.35   2.09    3
1   -6.00   2.76    3
4   -3.46   1.34    8
5   -4.67   1.23    8
6   -1.99   3.44    8
7   -5.67   2.40    11
8   -7.56   1.66    11
9   -9.00   3.12    21
10  -8.01   3.11    21

answered Apr 6, 2017 at 17:10

pansen

6,7034 gold badges21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Liza Over a year ago

Thanks @pansen . This is what I required.

Liza Over a year ago

Though instead of printing, I am creating a new df with the following: mask = df2["final_df"].isin(df["ID"]) new_df = pd.DataFrame(df2[mask]) new_df.head()

pansen Over a year ago

@Liza You can simplify it to cluster5 = df2[mask]. You don't need to call the dataframe constructor.

Collectives™ on Stack Overflow

How to create new df based on columns of two different data frames?

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related