7

I am trying to multiply two columns (ActualSalary * FTE) within the dataframe (OPR) to create a new column (FTESalary), but somehow it has stopped at row 21357, I don't understand what went wrong or how to fix it. The two columns came from importing a csv file using the line: OPR = pd.read_csv('OPR.csv', encoding='latin1')

[In] OPR
[out]
ActualSalary    FTE
44600           1
58,000.00       1
70,000.00       1
17550           1
34693           1
15674           0.4

[In] OPR["FTESalary"] = OPR["ActualSalary"].str.replace(",", "").astype("float")*OPR["FTE"]
[In] OPR
[out]
ActualSalary    FTE FTESalary
44600           1   44600
58,000.00       1   58000
70,000.00       1   70000
17550           1   NaN
34693           1   NaN
15674           0.4 NaN

I am not expecting any NULL values as an output at all, I am really struggling with this. I would really appreciate the help. Many thanks in advance! (I am new to both coding and here, please let me know via message if I have made mistakes or can improve the way I post questions here)

Sharing the data @oppresiveslayer

[In] OPR[0:6].to_dict()
[out]
{'ActualSalary': {0: '44600',
1: '58,000.00',
2: '70,000.00',
3: '39,780.00',
4: '0.00',
5: '78,850.00'},
 'FTE': {0: 1.0, 1: 1.0, 2: 1.0, 3: 1.0, 4: 1.0, 5: 1.0}}

For more information on the two columns @charlesreid1

[in] OPR['ActualSalary'].astype
[out]
Name: ActualSalary, Length: 21567, dtype: object>

[in] OPR['FTE'].astype
[out]
Name: FTE, Length: 21567, dtype: float64>

The version I am using: python: 3.7.3, pandas: 0.25.1 on jupyter Notebook 6.0.0

3
  • Possibly a duplicate of stackoverflow.com/questions/14059094/… Commented Dec 12, 2019 at 23:46
  • 2
    It is a good read for me, a lot to learn, but unfortunately not quite the same issue I am experiencing here. Thanks though @charlesreid1 Commented Dec 13, 2019 at 0:45
  • 1
    By construct the dataframes, are you referring to how I have those data in the first place? I have loaded it from a csv file. The ActualSalary is dtype: object, whereas FTE is dtype: float64. I will amend my question to include this now Commented Dec 13, 2019 at 1:24

3 Answers 3

5

I believe that your ActualSalary column is a mix of strings and integers. That is the only way I've been able to recreate your error:

df = pd.DataFrame(
    {'ActualSalary': ['44600', '58,000.00', '70,000.00', 17550, 34693, 15674],
     'FTE': [1, 1, 1, 1, 1, 0.4]})

>>> df['ActualSalary'].str.replace(',', '').astype(float) * df['FTE']
0    44600.0
1    58000.0
2    70000.0
3        NaN
4        NaN
5        NaN
dtype: float64

The issue arises when you try to remove the commas:

>>> df['ActualSalary'].str.replace(',', '')
0       44600
1    58000.00
2    70000.00
3         NaN
4         NaN
5         NaN
Name: ActualSalary, dtype: object

First convert them to strings, before converting back to floats.

fte_salary = (
    df['ActualSalary'].astype(str).str.replace(',', '')  # Remove commas in string, e.g. '55,000.00' -> '55000.00'
    .astype(float)  # Convert string column to floats.
    .mul(df['FTE'])  # Multiply by new salary column by Full-Time-Equivalent (FTE) column.
)
>>> df.assign(FTESalary=fte_salary)  # Assign new column to dataframe.
      ActualSalary  FTE  FTESalary
    0        44600  1.0    44600.0
    1    58,000.00  1.0    58000.0
    2    70,000.00  1.0    70000.0
    3        17550  1.0    17550.0
    4        34693  1.0    34693.0
    5        15674  0.4     6269.6
Sign up to request clarification or add additional context in comments.

6 Comments

I have previously used the .mul method as well but yielded the same problem I am facing, unfortunately.
@SyLviA I cannot replicate your error. What version of python and pandas are you using?
python: 3.7.3, pandas: 0.25.1 I am using the jupyter Notebook 6.0.0 @Alexander
I think the issue is that the ActualSalary column is initially a mix of integers and strings. Try first to cast it to strings and then continue as above, ie. df['ActualSalary'].astype(str).str.replace(',', '').astype(float).mul(df['FTE'])
Thank you so much @Alexander, that fixed the issue!! The whole having to cast it to strings and then continue with my original code! I am so happy, however I don't understand why I have to cast it to string first if it's a mixture? Thank you once again!
|
0

This should work:

OTR['FTESalary'] = OTR.apply(lambda x: pd.to_numeric(x['ActualSalary'].replace(",", ""), errors='coerce') * x['FTE'], axis=1)

output

  ActualSalary  FTE  FTESalary
0        44600  1.0    44600.0
1    58,000.00  1.0    58000.0
2    70,000.00  1.0    70000.0
3        17550  1.0    17550.0
4        34693  1.0    34693.0
5        15674  0.4     6269.6

ok, i think you need to do this:

OTR['FTESalary'] = OTR.reset_index().apply(lambda x: pd.to_numeric(x['ActualSalary'].replace(",", ""), errors='coerce') * x['FTE'], axis=1).to_numpy().tolist() 

7 Comments

I have tried applying your code to mine but have this error below (I am still early stage of learning the meaning of these messages...) AttributeError: ("'int' object has no attribute 'replace'", 'occurred at index 20480')
@sylvia, what is the output of pd.__version__ i think i need to install you version to see the error message. I don't mind doing so, so i can get a working version
it is '0.25.1' (thanks for teaching me how to check the version) @oppressionslayer
@SyLviA Ok, i added an update, can you try that. I got the same error as you, so i think i fixed it. IT's actually not a bug, but because we need to reset_index since you have an index set already.
I still have the same error: AttributeError: ("'int' object has no attribute 'replace'", 'occurred at index 20480') Is this because I don't know how to 'clean' the raw csv file? @oppressionslayer
|
0

I was able to do it in a couple steps, but with list comprehension which might be less readable for a beginner. It makes an intermediate column, which does the float conversion, since your ActualSalary column is full of strings at the start.

OPR["X"] = [float(x.replace(",","")) for x in OPR["ActualSalary"]]
OPR["FTESalary"] = OPR["X"]*OPR["FTE"]

1 Comment

I have used your code above and unfortunately I am still having the same issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.