Python dataframes

Question

I have a dataframe (df) and trying to append data to a specific row

Index Fruit Rank 0 banana 1 1 apple 2 2 mango 3 3 Melon 4

The goal is to compare the Fruit at Rank 1 to each rank and then append the value. I'm using difflib.SequenceMatcher to make the comparison. Right now i'm able to append to df but i end up appending the same value to each row. I'm struggling with the loop and append. Any pointers would be much appreciated.

Here is some of my code:

new_entry = df[(df.Rank ==1)]
new_fruit = new_entry['Fruit']

prev_entry = df[(df.Rank ==2)]
prev_fruit = prev_entry['Fruit']


similarity_score = difflib.SequenceMatcher(None, str(new_fruit).lower(), str(prev_fruit).lower()).ratio()

df['similarity_score'] = similarity_score

The result is something like this:

Index Fruit Rank similarity_score 0 banana 1 0.3 1 apple 2 0.3 2 mango 3 0.3 3 Melon 4 0.3

The desired result is:

Index Fruit Rank similarity_score 0 banana 1 n/a 1 apple 2 0.4 2 mango 3 0.5 3 Melon 4 0.6

Thanks.

bananafish · Accepted Answer · 2014-06-24 22:59:20Z

1

This doesn't give the similarity score order you want, but it calculates the SequenceMatcher ratio to the rank 1 value ('banana') and each row and adds it as a column.

import pandas as pd
import difflib

df = pd.DataFrame({'Fruit': ['banana', 'apple', 'mango', 'melon'],
                   'Rank': [1, 2, 3, 4]})

top = df['Fruit'][df.Rank == 1][0]
df['similarity_score'] = df['Fruit'].apply(lambda x: difflib.SequenceMatcher(
                                           None, top, x).ratio())

answered Jun 24, 2014 at 22:59

bananafish

2,91722 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python dataframes

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related