how to extract pandas dataframe from another dataframe based on multiple column?

Question

I have two pandas df as below:-

df1

Type      season    name        qty
Fruit     summer    Mango        12
Fruit     summer    watermelon   23
Fruit     summer    blueberries  200
vegetable summer    Peppers      24


df2

Availability       season          name      city
  YEs              summer          Mango     Pune
  Yes              summer          Peppers   Mumbai
  Yes              summer          Tomatoes  Mumbai

I want to compare df2 column season and name with df1 and return matched rows with an extra column name called status contain (1 represents match,0 represents not match) in df1. In this case like below.

df1
Type       season    name        qty   status
Fruit      summer    Mango        12     1
Fruit      summer    watermelon   23     0
Fruit      summer    blueberries  200    0
vegetable  summer    Peppers      24     1

perl · Accepted Answer · 2021-04-05 07:55:12Z

4

Here's another option using merge with how='left':

df1.merge(
    df2[['season', 'name']].assign(status=1),
    how='left').fillna(0)

Output:

        Type  season         name  qty  status
0      Fruit  summer        Mango   12     1.0
1      Fruit  summer   watermelon   23     0.0
2      Fruit  summer  blueberries  200     0.0
3  vegetable  summer      Peppers   24     1.0

answered Apr 5, 2021 at 7:55

perl

9,9811 gold badge14 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Pablo C · Accepted Answer · 2021-04-06 10:46:41Z

0

You can use .isin in the following way:

df1["status"] = list(zip(df1.season, df1.name))
df1["status"] = df1["status"].isin(list(zip(df2.season, df2.name)))

Output

df1
        Type  season         name  qty  status
0      Fruit  summer        Mango   12    True
1      Fruit  summer   watermelon   23   False
2      Fruit  summer  blueberries  200   False
3  vegetable  summer      Peppers   24    True

Performance (vs. @perl's answer)

data = {'Type': {0: 'Fruit', 1: 'Fruit', 2: 'Fruit', 3: 'vegetable'},
 'season': {0: 'summer', 1: 'summer', 2: 'summer', 3: 'summer'},
 'name': {0: 'Mango', 1: 'watermelon', 2: 'blueberries', 3: 'Peppers'},
 'qty': {0: 12, 1: 23, 2: 200, 3: 24}}

#@perl's answer
%%timeit 
df1 = pd.DataFrame(data) 
df1.merge( 
     df2[['season', 'name']].assign(status=1), 
     how='left').fillna(0)
                                                                       
#5.44 ms ± 56.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

#my answer
%%timeit
df1["status"] = list(zip(df1.season, df1.name))
df1["status"].isin(list(zip(df2.season, df2.name)))

#434 µs ± 4.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Old (and wrong) answer

You can use .isin with .to_dict:

cols = ['season', 'name']
df1['status'] = df1[cols].isin(df2[cols].to_dict('list')).all(1).astype('int')

Output

df1
        Type  season         name  qty  status
0      Fruit  summer        Mango   12       1
1      Fruit  summer   watermelon   23       0
2      Fruit  summer  blueberries  200       0
3  vegetable  summer      Peppers   24       1

edited Apr 6, 2021 at 10:46

answered Apr 5, 2021 at 7:38

Pablo C

4,7692 gold badges10 silver badges26 bronze badges

3 Comments

perl Over a year ago

This is not a correct answer, and IMO should not be accepted. Set in df1 Mango season to 'winter' and in df2 Peppers season to 'winter', and you should not get any matches, i.e. the result should be status = 0 for all rows, while this method returns status = 1 for both Mangos and Peppers

Pablo C Over a year ago

@perl I didn't realize, I'm so sorry. I updated it and added a new approach that actually works

perl Over a year ago

OK, sure. Regarding performance, it's kind of useless to test it on a sample with 4 rows. Hardly anyone will care about the difference in execution time between 5ms and 0.4ms. Of course, it is more important on large datasets, but if you run it on larger samples, merge performs better than isin(list(...))

Collectives™ on Stack Overflow

how to extract pandas dataframe from another dataframe based on multiple column?

2 Answers 2

Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related