3

I'm converting a .ods spreadsheet to a Pandas DataFrame. I have whole columns and rows I'd like to drop because they contain only "None". As "None" is a str, I have:

pandas.DataFrame.replace("None", numpy.nan)

...on which I call: .dropna(how='all')

Is there a pandas equivalent to numpy.nan?

Is there a way to use .dropna() with the *string "None" rather than NaN?

8
  • 3
    Pandas uses numpy.nan. Pandas uses a lot of numpy data structures and algorithms under the covers. Commented Jul 20, 2016 at 20:58
  • 1
    How would I use it without importing numpy? Commented Jul 20, 2016 at 20:59
  • If you set a value in a float64 column to None this will be interpreted by pandas as a missing value, and will be replaced with numpy.nan. Commented Jul 20, 2016 at 21:01
  • 1
    I don't know how you're creating your dataframes, but there are ways of letting pandas know that you want to interpret some input values as NaNs. See, for example, pd.read_csv's na_values argument. Commented Jul 20, 2016 at 21:15
  • 3
    I think you can do pandas.np.nan Commented Jul 20, 2016 at 22:14

2 Answers 2

8

You can use float('nan') if you really want to avoid importing things from the numpy namespace:

>>> import pandas as pd
>>> s = pd.Series([1, 2, 3])
>>> s[1] = float('nan')
>>> s
0    1.0
1    NaN
2    3.0
dtype: float64
>>> 
>>> s.dropna()
0    1.0
2    3.0
dtype: float64

Moreover, if you have a string value "None", you can .replace("None", float("nan")):

>>> s[1] = "None"
>>> s
0       1
1    None
2       3
dtype: object
>>> 
>>> s.replace("None", float("nan"))
0    1.0
1    NaN
2    3.0
dtype: float64
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks. Is there a way to use dropna on a value other than nan? Or is there some other method that does this?
Like, .replace("None", np.nan)?
I'd rather avoid .replace() altogether. I'm only using it because I want to use dropna.
1

If you are trying to drop directly the rows containing a "None" string value (without converting these "None" cells to NaN values), I guess it can be done without using replace + dropna

Considering a DataFrame like :

In [3]: df = pd.DataFrame({
            "foo": [1,2,3,4],
            "bar": ["None",5,5,6],
            "baz": [8, "None", 9, 10]
            })

In [4]: df
Out[4]: 
    bar   baz  foo
0  None     8    1
1     5  None    2
2     5     9    3
3     6    10    4

Using replace and dropna will return

In [5]: df.replace('None', float("nan")).dropna()
Out[5]: 
   bar   baz  foo
2  5.0   9.0    3
3  6.0  10.0    4

Which can also be obtained by simply selecting the row you need :

In [7]: df[df.eval("foo != 'None' and bar != 'None' and baz != 'None'")]
Out[7]: 
  bar baz  foo
2   5   9    3
3   6  10    4

You can also use the drop method of your dataframe, selecting appropriately the axis/labels targeted :

In [9]: df.drop(df[(df.baz == "None") |
                   (df.bar == "None") |
                   (df.foo == "None")].index)
Out[9]: 
  bar baz foo
2   5   9   3
3   6  10   4

These two methods are more or less interchangeable as you can also do for example:
df[(df.baz != "None") & (df.bar != "None") & (df.foo != "None")]
(but i guess the comparison df.somecolumns == "Some string" is only possible if the column type allows it, before theses last 2 examples, which wasn't the case with eval, i had to do df = df.astype (object) as the foo columns was of type int64)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.