Subtract values matching index in other dataframe

Question

This is somewhat basic, but I couldn't find a simple answer. In Python have a dataframe A like this:

  ItemId      Price
  -------   -------
0   a1         10.0
1   a1         15.0
2   a2          8.0
3   a3          7.0

And a second one, B, like this, where item ids appear only once, they are index:

ItemId   Discount
------  ---------
a1            0.2
a2            0.5
a4            0.3

I want to subtract 'Discount' values (from B) from 'Price' of dataframe A, by matching Item Ids, like this:

ItemId   Price
-------  -----
a1         9.8
a1        14.8
a2         7.5
a3         7.0

How can I do this in a efficient way, taking into account that actual dataframes have thousands of rows and many other columns?

@SeaBean as in the question: "And a second one, B, like this, where item ids appear only once, they are index" — thatOldITGuy
– thatOldITGuy, Commented Jul 23, 2021 at 21:31

tdy · Accepted Answer · 2021-07-23 22:26:33Z

2

reindex() the discounts using the price df with fill_value=0:

A.set_index('ItemId').Price - B.Discount.reindex(A.ItemId, fill_value=0)

# ItemId
# a1     9.8
# a1    14.8
# a2     7.5
# a3     7.0
# dtype: float64

Timings of the current answers:

map_ = lambda A, B: A.Price - A.ItemId.map(B.Discount).fillna(0)
reindex_ = lambda A, B: A.set_index('ItemId').Price - B.Discount.reindex(A.ItemId, fill_value=0)
merge_ = lambda A, B: A.merge(B, on='ItemId', how='left').eval('Price - Discount.fillna(0)')

edited Jul 23, 2021 at 22:26

answered Jul 23, 2021 at 21:07

tdy

42k42 gold badges124 silver badges125 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

thatOldITGuy Over a year ago

Aren't those expensive operations for large dataframes? Do you know how do they compare to the map solution posted by @not_speshal above?

thatOldITGuy Over a year ago

you're right about a3. To be fair, I've edited the question to consider it after the answer was posted. Anyway, just by adding fillna would do the trick: dfA["Price"]-dfA["ItemId"].map(dfB["Discount"]).fllna(0)

tdy Over a year ago

@thatOldITGuy so i've pushed the benchmarks as far as my laptop can handle (10^8 rows in A and 10^4 rows in B), and it seems map/reindex are pretty similar for larger dfs

thatOldITGuy Over a year ago

very interesting. Since yours is the most complete answer I'm accepting it as solution.

not_speshal · Accepted Answer · 2021-07-23 20:39:24Z

1

You can just use map:

>>> dfA["Price"]-dfA["ItemId"].map(dfB["Discount"])
0     9.8
1    14.8
2     7.5
dtype: float64

answered Jul 23, 2021 at 20:39

not_speshal

23.2k2 gold badges18 silver badges33 bronze badges

Comments

Cameron Riddell · Accepted Answer · 2021-07-23 21:13:48Z

0

You can use a merge to align the frames on the "ItemId" column and eval to operate on those aligned columns:

>>> df1.merge(df2, on="ItemId", how="left").eval("Price - Discount.fillna(0)")
0     9.8
1    14.8
2     7.5
dtype: float64

answered Jul 23, 2021 at 21:13

Cameron Riddell

13.8k14 silver badges21 bronze badges

Collectives™ on Stack Overflow

Subtract values matching index in other dataframe

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related