1

I have missing non-numeric data in a pandas dataframe, is there a way of replacing the NaN with the value from a different row when another column matches? E.g:

tdf = pandas.DataFrame({
    "id": [np.nan, 22, 22, 45, 45, 81],
    "item": ["apple", "apple", "apple", "orange", "orange", "banana"]
})

    id  item
0   NaN apple
1   22  apple
2   22  apple
3   45  orange
4   45  orange
5   81  banana

So I would want to replace the id in the first row with 22 because the item is the same as row 1 or 2.

1 Answer 1

3

You can groupby on 'item' and pass param as_index=False and then call bfill to fill the NaN values backwards:

In [424]:

tdf.groupby('item', as_index=False)..bfill()
Out[424]:
   id    item
0  22   apple
1  22   apple
2  22   apple
3  45  orange
4  45  orange
5  81  banana
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you! Exactly what I was looking for but didn't come across this possibility yet.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.