0

I'm trying to convert the '-' string characters into np.nan whilst leaving the strings denoting negative floats/ints so that I can convert this into floats once removing the '-' characters which denote absent data.

I've tried using .applymap() for this as I want to apply this to the whole dataframe, but it doesn't work.

Here is the line of code:

dataframe.applymap(lambda x: None if (x[-1] == '-'))

Here is a sample of the dataframe:

Metric              2020    2019    2018     2017   
Revenue Growth %    344.17  -14.88  107.11   -
Shares Change %     0.23    0       -        -
Gross Margin %      87.7    84      89.3     84.9
Operating Margin %  -17.1   -167.2  -42.2    -99.5
0

2 Answers 2

1

Use replace and regex parameter:

>>> df.replace(r'^-$', np.NaN, regex=True)
                 Metric    2020    2019    2018   2017
Revenue   Growth      %  344.17  -14.88  107.11    NaN
Shares    Change      %    0.23    0.00     NaN    NaN
Gross     Margin      %   87.70   84.00    89.3   84.9
Operating Margin      %  -17.10 -167.20   -42.2  -99.5

If you want to convert to float:

>>> df.filter(regex='\d+').replace(r'^-$', np.NaN, regex=True).astype(float)
Sign up to request clarification or add additional context in comments.

Comments

0

We can use DataFrame.mask to replace cells which are equal to '-':

df = df.mask(df.eq('-'))

df:

               Metric    2020    2019    2018  2017
0    Revenue Growth %  344.17  -14.88  107.11   NaN
1     Shares Change %    0.23    0.00     NaN   NaN
2      Gross Margin %   87.70   84.00   89.30  84.9
3  Operating Margin %  -17.10 -167.20  -42.20 -99.5

We can select columns with loc and convert to float with astype (in this example is all columns from 2020 forward):

df.loc[:, '2020':] = df.loc[:, '2020':].astype('float')

Or is 2020 is a number not a string:

df.loc[:, 2020:] = df.loc[:, 2020:].astype('float')

df.info():

 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Metric  4 non-null      object 
 1   2020    4 non-null      float64
 2   2019    4 non-null      float64
 3   2018    3 non-null      float64
 4   2017    2 non-null      float64

DataFrame:

df = pd.DataFrame({
    'Metric': ['Revenue Growth %', 'Shares Change %', 'Gross Margin %',
               'Operating Margin %'],
    '2020': [344.17, 0.23, 87.7, -17.1],
    '2019': [-14.88, 0.0, 84.0, -167.2],
    '2018': ['107.11', '-', '89.3', '-42.2'],
    '2017': ['-', '-', '84.9', '-99.5']
})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.