5

given is the following code. I want to copy values from "price" to "newprice" when there is no value provided.

import pandas as pd
articles = {"number": ["111", "222", "333"],
          "price": [12.46, 33.66, 43.35],
          "name": ["Article1", "Article2", "Article3"],
          "newprice": [11.24, None, None]}
df = pd.DataFrame(articles)
print(df)

# Select empty entries
mask = df['newprice'].isnull()
# Version 1 (not working)
df.loc[mask, ['newprice']] = df.loc[mask, ['price']]
print(df)

# Output
#  number  price      name  newprice
# 0    111  12.46  Article1     11.24
# 1    222  33.66  Article2       NaN
# 2    333  43.35  Article3       NaN
# Version 2 (working)
df.loc[mask, ['newprice']] = df['price']
print(df)

# Output
#  number  price      name  newprice
# 0    111  12.46  Article1     11.24
# 1    222  33.66  Article2     33.66
# 2    333  43.35  Article3     43.35

In Version 1 there are still Nans in the "newprice" col, while on Version 2 the values are copyed as intended. I filter both sides of the assignment by the same mask, so there should an exact match. I mean to know that both sides of the assignemnt have to provide the exact same amount of results. So why is Version 1 not working?

Thanks in advance for your help.

2 Answers 2

4

I think here is problem if [] in both loc, so you assign one column DataFrame. There is different names price vs newprice, so failed.

Solution is remove [] for assign Series:

df.loc[mask, 'newprice'] = df.loc[mask, 'price']
print(df)
  number  price      name  newprice
0    111  12.46  Article1     11.24
1    222  33.66  Article2     33.66
2    333  43.35  Article3     43.35

Assign one column DataFrame is similar with no condition (failed):

df.loc[mask, 'newprice'] = df[['price']]

vs assign Series (working):

df.loc[mask, 'newprice'] = df['price']
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the hint, this is working as expected now.
-1

Using the np.where is about 4-5 times faster that the loc with mask & is much more easier to understand.

df['newprice'] = np.where(df['newprice'].isnull(), df['price'], df['newprice'])

Output

number  price   name    newprice
0   111     12.46   Article1    11.24
1   222     33.66   Article2    33.66
2   333     43.35   Article3    43.35

enter image description here

3 Comments

Do you have proof that np.where is 4-5 times faster? Because I just tested different methods for assigning conditional columns and .loc was faster than np.where. Seems highly unlikely what you are saying
Pic added from %timeit on my machine
hmmm, testing in 3 rows? Rather not, outputs are different if large data

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.