Complex Map operation on two dataframes in pandas with condition

Question

i have a table in pandas df.

id    prod1    prod2    count
1      10       30        100
2      10       20        200
3      20       10        200
4      30       10        100
5      30       40        300

also i am having another table in df2

product  price   master_product
  1000       1          10
  5000       2          10
  2000       2          20
  9000       5          20
  8000       1          20
  30         3          0
  4000       4          50

Check if prod1 and prod2 belongs to the values in master_product,

if yes i want to replace prod1 and prod2 in my first df with the cheapest product in my master_product.

if the prod1 and prod2 values donot match with the values in master_product, leave the values as it is.

i am looking for the final table as.

id    prod1    prod2    count
1      1000    4000        100
2      1000    8000        200
3      8000    1000        200
4      30       1000        100   #since 30 is not in master_product,leave as it is
5      30       40        300

i was trying to use .map function to achieve this but i could only reach to this.

df['prod1'] = df['prod1'].map(df2.set_index('master_product')['product'])
df['prod2'] = df['prod2'].map(df2.set_index('master_product')['product'])

but it will try to replace every values in prod1 and prod2 with matching values in master_product from df2.

Any ideas how to achieve this?

jezrael · Accepted Answer · 2016-12-14 12:38:04Z

2

You can first modify df1 for minimal price by master_product by groupby with idxmin - get all indices with minimal price:

df1 = df1.loc[df1.groupby('master_product')['price'].idxmin()]
print (df1)
   product  price  master_product
5       30      3               0
0     1000      1              10
4     8000      1              20
6     4000      4              50

Create dict for mapping:

d = df1.set_index('master_product')['product'].to_dict()
print (d)
{0: 30, 10: 1000, 20: 8000, 50: 4000}

Last map and if value is missing add it by combine_first:

df.prod1 = df.prod1.map(d).combine_first(df.prod1)
df.prod2 = df.prod2.map(d).combine_first(df.prod2)
print (df)
   id   prod1   prod2  count
0   1  1000.0    30.0    100
1   2  1000.0  8000.0    200
2   3  8000.0  1000.0    200
3   4    30.0  1000.0    100
4   5    30.0    40.0    300

answered Dec 14, 2016 at 12:38

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

Shubham R Over a year ago

working! but say my df count is very large, how do i check whether it has changed to the least price product?

jezrael Over a year ago

very hard question for me. But one idea - check random sample 10 - 20 values from your DataFrame - use boolean indexing for filtering L = [10,0,50] df2 = df2[df2.master_product.isin(L)] print (df2)

Shubham R Over a year ago

type of elements in d is of numpy.int64 and the values in my dataframe df is of str type. your code is running fine but not giving the right results dont know why? do we have to do a set_index on df.prod1 before mapping it with d>?

jezrael Over a year ago

Yes, there has to be problem. If not int but string it map only NaN. try cast to int by df['col_name'] = df['col_name'].astype(str)

jezrael Over a year ago

Yes, you can use df[['col1','col2','col3']] = df[['col1','col2','col3']],astype(str) or df = df.astype(str)

|

Collectives™ on Stack Overflow

Complex Map operation on two dataframes in pandas with condition

1 Answer 1

12 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

12 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related