this seems like it should be an easy problem to solve, but I've been battling with it and cannot seem to find a solution.
I have two dataframes of different sizes and different column names. I am trying to compare a column in dataframe A, with another in dataframe B, and only retain the rows in dataframe A, if the strings are an EXACT match with B. I am not looking for a partial match or sub string, but an EXACT FULL STRING match. I have checked close to 40 questions now, and still keep getting partial matches.
Dataframe A
ID Text score
0 1 admin 10.4
1 2 Care 2.0
2 3 Office 5.0
3 4 Doctor 0.5
4 5 be 0.2
5 6 to 0.9
6 7 in 0.8
And lets assume that the second dataframe is:
Dataframe B
ID Labels Places
0 1 Office Upper
1 2 administration Lower
2 3 Doctor Internal
3 4 Reception Outer
4 5 Tone Outer
5 6 Skin Internal
6 7 Behaviour Outer
7 8 Injury Upper
My desired output is
Dataframe A
ID Text score
2 3 Office 5.0
3 4 Doctor 0.5
This will be from comparing DataframeA['Text'] with DataframeB['Labels'] and only keeping the exact matches.
I have tried
df_A_new = df_A[df_A['Text'].isin(df_b['Labels'])]
and the output it gives me is
ID Text score
0 1 admin 10.4
2 3 Office 5.0
3 4 Doctor 0.5
4 5 be 0.2
5 6 to 0.9
6 7 in 0.8
it maps the substring admin and administration. Merge has not helped either.
df_a_new = df_a.merge(df_b, left_on='Text', right_on='Lables')[['ID', 'Text', 'score']]
I have checked so many answers on stackoverflow, but they all seem to be matching substrings! Can anyone help with this?
df_A_new=(df_A[df_A['Text'].isin(df_B['Labels'])])is correct and gives the desired result.ID Text score 2 3 Office 5.0 3 4 Doctor 0.5what is your setup in versions or where are you loading your data from?