replace value by using regex to np.nan

Question

I have a dataframe as below :

data1 = {"first":["alice", "bob", "carol"],
         "last_huge":["foo", "bar", "baz"]}
df = pd.DataFrame(data1)

For example , I want to replace all character 'o' to 'a':

Then I do

df.replace({"o":"a"},regex=True)
Out[668]: 
   first last
0  alice  faa
1    bab  bar
2  caral  baz

It give back what I need .

However, when I want to replace 'o' to np.nan , It will change entire string to np.nan. Is there any explanation from pandas' document? I can find some information through the source code .

More Information:(It will change whole string to np.nan)

df.replace({"o":np.nan},regex=True)
Out[669]: 
   first last
0  alice  NaN
1    NaN  bar
2    NaN  baz

@ShiheZhang there is no desire result , just why replace + regex have this kind of behavior, I can not find any document related to this , only way is to reading thru the source code. — BENY
– BENY, Commented Oct 26, 2017 at 2:22
What version of pandas are you on? This actually happens with any non-string object, as far as I can tell, try passing it object() — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Oct 26, 2017 at 2:23
@juanpa.arrivillaga my pandas pd.__version__ Out[692]: '0.20.3' — BENY
– BENY, Commented Oct 26, 2017 at 2:24
you will need to use for index, row in df.iterrows(): to loop through the df and something like` if(s.contains("0"))` to check if it does. then update the whole value rather than the character — Jesse
– Jesse, Commented Oct 26, 2017 at 2:25

Andy Hayden · Accepted Answer · 2017-10-26 02:39:45Z

4

NaN is consistently used as a placeholder for missing, when replacing part of a string with "missing" it can only mean the entire entry is compromised. I've heard this called NaN pollution (or similar, will see if I can find some references), in that if NaN touches the data is compromised.

That said, that's not always the case:

In [11]: s = pd.Series([1, 2, np.nan, 4])

In [12]: s.sum()
Out[12]: 7.0

In [13]: s.sum(skipna=False)
Out[13]: nan

In some languages you'll see skipna=False as the default behaviour, some vehemently argue that NaN should always pollute all data. Pandas takes a somewhat more pragmatic approach...

The real question is what do you expect it to do in the case of NaN?

answered Oct 26, 2017 at 2:39

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

BENY Over a year ago

Just like what I am doing with R sum(.,na.rm=T)

BENY Over a year ago

For me , I just do not know why they do not give back any error, just replace entire string to np.nan ,at least should give back some warning right ...

Shihe Zhang Over a year ago

Because in python, it's the rightful result, don't need to raise a warning.

Shihe Zhang · Accepted Answer · 2017-10-26 02:55:03Z

In python there are cmath.nan and math.nan.

CPython implementation detail: The math module consists mostly of thin wrappers around the platform C math library functions. Behavior in exceptional cases follows Annex F of the C99 standard where appropriate. The current implementation will raise ValueError for invalid operations like sqrt(-1.0) or log(0.0) (where C99 Annex F recommends signaling invalid operation or divide-by-zero), and OverflowError for results that overflow (for example, exp(1000.0)). A NaN will not be returned from any of the functions above unless one or more of the input arguments was a NaN; in that case, most functions will return a NaN, but (again following C99 Annex F) there are some exceptions to this rule, for example pow(float('nan'), 0.0) or hypot(float('nan'), float('inf')).

In short word, when your input arguments have NaN it would return NaN

And also:

Note that Python makes no effort to distinguish signaling NaNs from quiet NaNs, and behavior for signaling NaNs remains unspecified. Typical behavior is to treat all NaNs as though they were quiet.

Collectives™ on Stack Overflow

replace value by using regex to np.nan

2 Answers 2

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related