0

I have a dataframe (df) and trying to append data to a specific row

Index Fruit Rank 0 banana 1 1 apple 2 2 mango 3 3 Melon 4

The goal is to compare the Fruit at Rank 1 to each rank and then append the value. I'm using difflib.SequenceMatcher to make the comparison. Right now i'm able to append to df but i end up appending the same value to each row. I'm struggling with the loop and append. Any pointers would be much appreciated.

Here is some of my code:

new_entry = df[(df.Rank ==1)]
new_fruit = new_entry['Fruit']

prev_entry = df[(df.Rank ==2)]
prev_fruit = prev_entry['Fruit']


similarity_score = difflib.SequenceMatcher(None, str(new_fruit).lower(), str(prev_fruit).lower()).ratio()

df['similarity_score'] = similarity_score

The result is something like this:

Index Fruit Rank similarity_score 0 banana 1 0.3 1 apple 2 0.3 2 mango 3 0.3 3 Melon 4 0.3

The desired result is:

Index Fruit Rank similarity_score 0 banana 1 n/a 1 apple 2 0.4 2 mango 3 0.5 3 Melon 4 0.6

Thanks.

1 Answer 1

1

This doesn't give the similarity score order you want, but it calculates the SequenceMatcher ratio to the rank 1 value ('banana') and each row and adds it as a column.

import pandas as pd
import difflib

df = pd.DataFrame({'Fruit': ['banana', 'apple', 'mango', 'melon'],
                   'Rank': [1, 2, 3, 4]})

top = df['Fruit'][df.Rank == 1][0]
df['similarity_score'] = df['Fruit'].apply(lambda x: difflib.SequenceMatcher(
                                           None, top, x).ratio())
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.