0

I have a two data frame df1 (35k record) and df2(100k records). In df1['col1'] and df2['col3'] i have unique id's. I want to match df1['col1'] with df2['col3']. If they match, I want to update df1 with one more column say df1['Match'] with value true and if not match, update with False value. I want to map this TRUE and False value against Matching and non-matching record only.

I am using .isin()function, I am getting the correct match and not match count but not able to map them correctly.

Match = df1['col1'].isin(df2['col3'])
df1['match'] = Match

I have also used merge function using by passing the parameter how=rightbut did not get the results.

3
  • What do you mean by not mapping correctly? Using your syntax, df1['match'] = df1['col1'].isin(df2['col3']) seems to work for your described goal. Rows of df1 whose col1 value is found in df2['col3'] will be True, otherwise False. Commented Jan 27, 2019 at 8:27
  • @kentwait After doing df1['match'] = Match, if I have 10 records matching. then in df1 it is just getting updated in serial wise but not to the exact record it is matching. Commented Jan 27, 2019 at 8:35
  • The number of rows returned by df1['col1'].isin(df2['col3']) is equal to the number of rows of df1 regardless of how many matching "True" records found. You can try @crazyGamer answer but your code should work fine. Maybe something else is wrong. Commented Jan 27, 2019 at 8:40

2 Answers 2

1

You can simply do as follows:

df1['Match'] = df1['col1'].isin(df2['col3'])

For instance:

import pandas as pd
data1 = [1,2,3,4,5]
data2 = [2,3,5]
df1 = pd.DataFrame(data1, columns=['a'])
df2 = pd.DataFrame(data2,columns=['c'])
print (df1)
print (df2)
df1['Match'] = df1['a'].isin(df2['c']) # if matches it returns True else False
print (df1)

Output:

  a
0  1
1  2
2  3
3  4
4  5

   c
0  2
1  3
2  5

   a  Match
0  1  False
1  2   True
2  3   True
3  4  False
4  5   True
Sign up to request clarification or add additional context in comments.

2 Comments

This is the same as what the OP posted without the intervening variable. There must be something else the OP is encountering.
@Ranjith - This is also not working..as in you example all the values are in sorted order, if it is not sorted it will not work..the code that you share is what I am already doing.plz check my code
0

Use df.loc indexing:

df1['Match'] = False
df1.loc[df1['col1'].isin(df2['col3']), 'Match'] = True

2 Comments

@crazyGamer- thanks it work for me.. Can i also map df2['col3'] value for True cases?
Yes, you can repeat this pattern for df2 and col3. How it work is the first line creates a new column and sets all values to False. The second line indexes the rows using a boolean Series, and sets those rows to True.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.