2

basically i have a dataframe where is a lot of columns, but the main are ITEM_ID and PRICE.

For example:

ID  ITEM_ID  ITEM     PRICE
1      1      potato    20
2      1      potato    20
3      1      potato    25
4      2      tomato    50
5      2      tomato    55
 

And I want to delete the rows where ITEM_ID and PRICE are equal, so the output will be this:

ID  ITEM_ID  ITEM     PRICE
1      1      potato    20
2      1      potato    25
3      2      tomato    50
4      2      tomato    55
 

I am counting average price using

df['AVG'] = df.groupby('ITEM_ID')['PRICE'].transform('mean')

But I realised, that I am counting using the duplicate values, so the average is not right.

Can anybody help?

EDIT:

After trying suggested

df.drop_duplicates(subset=['item_id', 'price'])

the data are still there, even keep=False wont do nothing.

5
  • looks like you want to drop duplicates?df.drop_duplicates(subset=['item_id', 'price']) Commented Aug 4, 2021 at 10:12
  • Does this answer your question? Drop all duplicate rows across multiple columns in Python Pandas Commented Aug 4, 2021 at 10:12
  • Doesnt seem to work, the rows are still there. Commented Aug 4, 2021 at 10:27
  • Now its working, have to add inplace=True Commented Aug 4, 2021 at 10:50
  • Can you add the solution as answer and mark it accepted? Commented Aug 4, 2021 at 12:20

1 Answer 1

4

Solution to this problem is:

df.drop_duplicates(subset=['item_id', 'price'], inplace=True)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.