0

This is somewhat basic, but I couldn't find a simple answer. In Python have a dataframe A like this:

  ItemId      Price
  -------   -------
0   a1         10.0
1   a1         15.0
2   a2          8.0
3   a3          7.0

And a second one, B, like this, where item ids appear only once, they are index:

ItemId   Discount
------  ---------
a1            0.2
a2            0.5
a4            0.3

I want to subtract 'Discount' values (from B) from 'Price' of dataframe A, by matching Item Ids, like this:

ItemId   Price
-------  -----
a1         9.8
a1        14.8
a2         7.5
a3         7.0

How can I do this in a efficient way, taking into account that actual dataframes have thousands of rows and many other columns?

2
  • In B ItemId is a column or the index ? Commented Jul 23, 2021 at 21:27
  • 2
    @SeaBean as in the question: "And a second one, B, like this, where item ids appear only once, they are index" Commented Jul 23, 2021 at 21:31

3 Answers 3

2

reindex() the discounts using the price df with fill_value=0:

A.set_index('ItemId').Price - B.Discount.reindex(A.ItemId, fill_value=0)

# ItemId
# a1     9.8
# a1    14.8
# a2     7.5
# a3     7.0
# dtype: float64

Timings of the current answers:

timings of map vs reindex vs merge

map_ = lambda A, B: A.Price - A.ItemId.map(B.Discount).fillna(0)
reindex_ = lambda A, B: A.set_index('ItemId').Price - B.Discount.reindex(A.ItemId, fill_value=0)
merge_ = lambda A, B: A.merge(B, on='ItemId', how='left').eval('Price - Discount.fillna(0)')
Sign up to request clarification or add additional context in comments.

4 Comments

Aren't those expensive operations for large dataframes? Do you know how do they compare to the map solution posted by @not_speshal above?
you're right about a3. To be fair, I've edited the question to consider it after the answer was posted. Anyway, just by adding fillna would do the trick: dfA["Price"]-dfA["ItemId"].map(dfB["Discount"]).fllna(0)
@thatOldITGuy so i've pushed the benchmarks as far as my laptop can handle (10^8 rows in A and 10^4 rows in B), and it seems map/reindex are pretty similar for larger dfs
very interesting. Since yours is the most complete answer I'm accepting it as solution.
1

You can just use map:

>>> dfA["Price"]-dfA["ItemId"].map(dfB["Discount"])
0     9.8
1    14.8
2     7.5
dtype: float64

Comments

0

You can use a merge to align the frames on the "ItemId" column and eval to operate on those aligned columns:

>>> df1.merge(df2, on="ItemId", how="left").eval("Price - Discount.fillna(0)")
0     9.8
1    14.8
2     7.5
dtype: float64

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.