9

I'm importing some csv data into a Pandas DataFrame (in Python). One series is meant to be all numerical values. However, it also contains some spurious "$-" elements represented as strings. These have been left over from previous formatting. If I just import the series, Pandas reports it as a series of 'object'.

What's the best way to replace these "$-" strings with zeros? Or more generally, how can I replace all the strings in a series (which is predominantly numerical), with a numerical value, and convert the series to a floating point type?

  • Steve
1

3 Answers 3

12

Updated answer, April 2025:

pd.to_numeric can convert arguments to a numeric type. The option errors='coerce' sets things to NaN. However, it can only work on 1D objects (i.e. scalar, list, tuple, 1-d array, or Series). Therefore, to use it on a DataFrame, we need to use df.apply to convert each column individually. Note that any **kwargs given to apply will be passed onto the function, so we can still set errors='coerce'.

Using pd.to_numeric along with df.apply will set any strings to NaN. If we want to convert those to 0 values, we can then use .fillna(0) on the resulting DataFrame.

For example (and note this also works with the strings suggested by the original question "$-" and "($24)"):

import pandas as pd

df = pd.DataFrame({
    'a': (1, 'sd', 1),
    'b': (2., 2., 'fg'),
    'c': (4, "$-", "($24)")
    })

print(df)

#     a    b  c
# 0   1  2.0  4
# 1  sd  2.0     $-
# 2   1   fg  ($24)

df = df.apply(pd.to_numeric, errors='coerce').fillna(0)

print(df)

#      a    b  c
# 0  1.0  2.0  4.0
# 1  0.0  2.0  0.0
# 2  1.0  0.0  0.0

My original answer from 2015, which is now deprecated

You can use the convert_objects method of the DataFrame, with convert_numeric=True to change the strings to NaNs

From the docs:

convert_numeric: If True, attempt to coerce to numbers (including strings), with unconvertible values becoming NaN.

In [17]: df
Out[17]: 
    a   b  c
0  1.  2.  4
1  sd  2.  4
2  1.  fg  5

In [18]: df2 = df.convert_objects(convert_numeric=True)

In [19]: df2
Out[19]: 
    a   b  c
0   1   2  4
1 NaN   2  4
2   1 NaN  5

Finally, if you want to convert those NaNs to 0's, you can use df.replace

In [20]: df2.replace('NaN',0)
Out[20]: 
   a  b  c
0  1  2  4
1  0  2  4
2  1  0  5
Sign up to request clarification or add additional context in comments.

4 Comments

Note that pd.to_numeric is the new hotness; convert_objects has been deprecated.
ah, thanks. I hadn't upgraded to 0.17, so that option wasn't in my pandas. I'll update my answer...
@DSM it appears that only works on 1D objects though, so converting a DataFrame is more involved... or am I missing something?
Thanks - but it looks like my data is a little more polluted. It works for one series but not the other. The series which trips it up contains "$-" and "($24)" values. After the pd.to_numeric it still shows as an object type
6

Use .to_numeric to covert the strings to numeric (set strings to NaN using the errors option 'coerce'):

df = pd.to_numeric(df, errors='coerce')

and then convert the NaN value to zeros using replace:

df.replace('NaN',0)

1 Comment

pd.to_numeric will not work on DataFrames directly, you need to use df.apply to use it on each column individually
5

Use Series.str.replace and Series.astype

df = pd.Series(['2$-32$-4','123$-12','00123','44'])
df.str.replace(r'\$-','0').astype(float)

0    203204
1    123012
2       123
3        44
dtype: float64

5 Comments

Thanks - this almost works but trips up on "($24)" values.
If you want to leave only numbers you can use df.str.replace(r'[^0-9]+','')
Thanks - but how do you parse the parentheses to a negative number i.e. "$(24)" to -24?
Do you mean there can be a separate minuses? Can you post an example of your data?
Hi @hellpanderrr I posted a more general question here: stackoverflow.com/questions/33456364/… which has a solution - thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.