Creating NaN values in Pandas (instead of Numpy)

Question

I'm converting a .ods spreadsheet to a Pandas DataFrame. I have whole columns and rows I'd like to drop because they contain only "None". As "None" is a str, I have:

pandas.DataFrame.replace("None", numpy.nan)

...on which I call: .dropna(how='all')

Is there a pandas equivalent to numpy.nan?

Is there a way to use .dropna() with the *string "None" rather than NaN?

Pandas uses numpy.nan. Pandas uses a lot of numpy data structures and algorithms under the covers. — Alex
– Alex, Commented Jul 20, 2016 at 20:58
If you set a value in a float64 column to None this will be interpreted by pandas as a missing value, and will be replaced with numpy.nan. — Alex
– Alex, Commented Jul 20, 2016 at 21:01
I don't know how you're creating your dataframes, but there are ways of letting pandas know that you want to interpret some input values as NaNs. See, for example, pd.read_csv's na_values argument. — Alicia Garcia-Raboso
– Alicia Garcia-Raboso, Commented Jul 20, 2016 at 21:15

ev-br · Accepted Answer · 2016-07-20 21:09:26Z

8

You can use float('nan') if you really want to avoid importing things from the numpy namespace:

>>> import pandas as pd
>>> s = pd.Series([1, 2, 3])
>>> s[1] = float('nan')
>>> s
0    1.0
1    NaN
2    3.0
dtype: float64
>>> 
>>> s.dropna()
0    1.0
2    3.0
dtype: float64

Moreover, if you have a string value "None", you can .replace("None", float("nan")):

>>> s[1] = "None"
>>> s
0       1
1    None
2       3
dtype: object
>>> 
>>> s.replace("None", float("nan"))
0    1.0
1    NaN
2    3.0
dtype: float64

edited Jul 20, 2016 at 21:09

answered Jul 20, 2016 at 21:05

ev-br

26.3k9 gold badges68 silver badges84 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

zadrozny Over a year ago

Thanks. Is there a way to use dropna on a value other than nan? Or is there some other method that does this?

ev-br Over a year ago

Like, .replace("None", np.nan)?

zadrozny Over a year ago

I'd rather avoid .replace() altogether. I'm only using it because I want to use dropna.

mgc · Accepted Answer · 2016-07-20 22:59:01Z

If you are trying to drop directly the rows containing a "None" string value (without converting these "None" cells to NaN values), I guess it can be done without using replace + dropna

Considering a DataFrame like :

In [3]: df = pd.DataFrame({
            "foo": [1,2,3,4],
            "bar": ["None",5,5,6],
            "baz": [8, "None", 9, 10]
            })

In [4]: df
Out[4]: 
    bar   baz  foo
0  None     8    1
1     5  None    2
2     5     9    3
3     6    10    4

Using replace and dropna will return

In [5]: df.replace('None', float("nan")).dropna()
Out[5]: 
   bar   baz  foo
2  5.0   9.0    3
3  6.0  10.0    4

Which can also be obtained by simply selecting the row you need :

In [7]: df[df.eval("foo != 'None' and bar != 'None' and baz != 'None'")]
Out[7]: 
  bar baz  foo
2   5   9    3
3   6  10    4

You can also use the drop method of your dataframe, selecting appropriately the axis/labels targeted :

In [9]: df.drop(df[(df.baz == "None") |
                   (df.bar == "None") |
                   (df.foo == "None")].index)
Out[9]: 
  bar baz foo
2   5   9   3
3   6  10   4

These two methods are more or less interchangeable as you can also do for example:
df[(df.baz != "None") & (df.bar != "None") & (df.foo != "None")]
(but i guess the comparison df.somecolumns == "Some string" is only possible if the column type allows it, before theses last 2 examples, which wasn't the case with eval, i had to do df = df.astype (object) as the foo columns was of type int64)

Collectives™ on Stack Overflow

Creating NaN values in Pandas (instead of Numpy)

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related