2

i have a table in pandas df.

id    prod1    prod2    count
1      10       30        100
2      10       20        200
3      20       10        200
4      30       10        100
5      30       40        300

also i am having another table in df2

product  price   master_product
  1000       1          10
  5000       2          10
  2000       2          20
  9000       5          20
  8000       1          20
  30         3          0
  4000       4          50

Check if prod1 and prod2 belongs to the values in master_product,

if yes i want to replace prod1 and prod2 in my first df with the cheapest product in my master_product.

if the prod1 and prod2 values donot match with the values in master_product, leave the values as it is.

i am looking for the final table as.

id    prod1    prod2    count
1      1000    4000        100
2      1000    8000        200
3      8000    1000        200
4      30       1000        100   #since 30 is not in master_product,leave as it is
5      30       40        300

i was trying to use .map function to achieve this but i could only reach to this.

df['prod1'] = df['prod1'].map(df2.set_index('master_product')['product'])
df['prod2'] = df['prod2'].map(df2.set_index('master_product')['product'])

but it will try to replace every values in prod1 and prod2 with matching values in master_product from df2.

Any ideas how to achieve this?

1 Answer 1

2

You can first modify df1 for minimal price by master_product by groupby with idxmin - get all indices with minimal price:

df1 = df1.loc[df1.groupby('master_product')['price'].idxmin()]
print (df1)
   product  price  master_product
5       30      3               0
0     1000      1              10
4     8000      1              20
6     4000      4              50

Create dict for mapping:

d = df1.set_index('master_product')['product'].to_dict()
print (d)
{0: 30, 10: 1000, 20: 8000, 50: 4000}

Last map and if value is missing add it by combine_first:

df.prod1 = df.prod1.map(d).combine_first(df.prod1)
df.prod2 = df.prod2.map(d).combine_first(df.prod2)
print (df)
   id   prod1   prod2  count
0   1  1000.0    30.0    100
1   2  1000.0  8000.0    200
2   3  8000.0  1000.0    200
3   4    30.0  1000.0    100
4   5    30.0    40.0    300
Sign up to request clarification or add additional context in comments.

12 Comments

working! but say my df count is very large, how do i check whether it has changed to the least price product?
very hard question for me. But one idea - check random sample 10 - 20 values from your DataFrame - use boolean indexing for filtering L = [10,0,50] df2 = df2[df2.master_product.isin(L)] print (df2)
type of elements in d is of numpy.int64 and the values in my dataframe df is of str type. your code is running fine but not giving the right results dont know why? do we have to do a set_index on df.prod1 before mapping it with d>?
Yes, there has to be problem. If not int but string it map only NaN. try cast to int by df['col_name'] = df['col_name'].astype(str)
Yes, you can use df[['col1','col2','col3']] = df[['col1','col2','col3']],astype(str) or df = df.astype(str)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.