Pandas - How to replace string with zero values in a DataFrame series?

Question

I'm importing some csv data into a Pandas DataFrame (in Python). One series is meant to be all numerical values. However, it also contains some spurious "$-" elements represented as strings. These have been left over from previous formatting. If I just import the series, Pandas reports it as a series of 'object'.

What's the best way to replace these "$-" strings with zeros? Or more generally, how can I replace all the strings in a series (which is predominantly numerical), with a numerical value, and convert the series to a floating point type?

Steve

I've re-posted as more general question about handling accounting format data. See here Convert a Pandas Series in Accounting Format to a Numeric Series? — Steve Maughan
– Steve Maughan, Commented Oct 31, 2015 at 20:42

tmdavison · Accepted Answer · 2025-04-25 10:54:49Z

12

Updated answer, April 2025:

pd.to_numeric can convert arguments to a numeric type. The option errors='coerce' sets things to NaN. However, it can only work on 1D objects (i.e. scalar, list, tuple, 1-d array, or Series). Therefore, to use it on a DataFrame, we need to use df.apply to convert each column individually. Note that any **kwargs given to apply will be passed onto the function, so we can still set errors='coerce'.

Using pd.to_numeric along with df.apply will set any strings to NaN. If we want to convert those to 0 values, we can then use .fillna(0) on the resulting DataFrame.

For example (and note this also works with the strings suggested by the original question "$-" and "($24)"):

import pandas as pd

df = pd.DataFrame({
    'a': (1, 'sd', 1),
    'b': (2., 2., 'fg'),
    'c': (4, "$-", "($24)")
    })

print(df)

#     a    b  c
# 0   1  2.0  4
# 1  sd  2.0     $-
# 2   1   fg  ($24)

df = df.apply(pd.to_numeric, errors='coerce').fillna(0)

print(df)

#      a    b  c
# 0  1.0  2.0  4.0
# 1  0.0  2.0  0.0
# 2  1.0  0.0  0.0

My original answer from 2015, which is now deprecated

You can use the convert_objects method of the DataFrame, with convert_numeric=True to change the strings to NaNs

From the docs:

convert_numeric: If True, attempt to coerce to numbers (including strings), with unconvertible values becoming NaN.

In [17]: df
Out[17]: 
    a   b  c
0  1.  2.  4
1  sd  2.  4
2  1.  fg  5

In [18]: df2 = df.convert_objects(convert_numeric=True)

In [19]: df2
Out[19]: 
    a   b  c
0   1   2  4
1 NaN   2  4
2   1 NaN  5

Finally, if you want to convert those NaNs to 0's, you can use df.replace

In [20]: df2.replace('NaN',0)
Out[20]: 
   a  b  c
0  1  2  4
1  0  2  4
2  1  0  5

edited Apr 25 at 10:54

answered Oct 30, 2015 at 16:15

tmdavison

69.7k13 gold badges204 silver badges182 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

DSM Over a year ago

Note that pd.to_numeric is the new hotness; convert_objects has been deprecated.

tmdavison Over a year ago

ah, thanks. I hadn't upgraded to 0.17, so that option wasn't in my pandas. I'll update my answer...

tmdavison Over a year ago

@DSM it appears that only works on 1D objects though, so converting a DataFrame is more involved... or am I missing something?

Steve Maughan Over a year ago

Thanks - but it looks like my data is a little more polluted. It works for one series but not the other. The series which trips it up contains "$-" and "($24)" values. After the pd.to_numeric it still shows as an object type

adiro · Accepted Answer · 2021-01-31 09:21:28Z

6

Use .to_numeric to covert the strings to numeric (set strings to NaN using the errors option 'coerce'):

df = pd.to_numeric(df, errors='coerce')

and then convert the NaN value to zeros using replace:

df.replace('NaN',0)

answered Jan 31, 2021 at 9:21

adiro

4161 gold badge5 silver badges18 bronze badges

1 Comment

tmdavison Apr 25 at 10:42

pd.to_numeric will not work on DataFrames directly, you need to use df.apply to use it on each column individually

William Miller · Accepted Answer · 2020-12-15 10:48:28Z

5

Use Series.str.replace and Series.astype

df = pd.Series(['2$-32$-4','123$-12','00123','44'])
df.str.replace(r'\$-','0').astype(float)

0    203204
1    123012
2       123
3        44
dtype: float64

edited Dec 15, 2020 at 10:48

William Miller

10.4k4 gold badges31 silver badges50 bronze badges

answered Oct 30, 2015 at 16:13

hellpanderr

5,9563 gold badges42 silver badges50 bronze badges

5 Comments

Steve Maughan Over a year ago

Thanks - this almost works but trips up on "($24)" values.

hellpanderr Over a year ago

If you want to leave only numbers you can use df.str.replace(r'[^0-9]+','')

Steve Maughan Over a year ago

Thanks - but how do you parse the parentheses to a negative number i.e. "$(24)" to -24?

hellpanderr Over a year ago

Do you mean there can be a separate minuses? Can you post an example of your data?

Steve Maughan Over a year ago

Hi @hellpanderrr I posted a more general question here: stackoverflow.com/questions/33456364/… which has a solution - thanks!

Collectives™ on Stack Overflow

Pandas - How to replace string with zero values in a DataFrame series?

3 Answers 3

4 Comments

1 Comment

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

1 Comment

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related