2

I'm not sure why this happens

>>> df = pd.DataFrame(np.arange(15).reshape(5,3),columns=list('ABC'))
>>> df
    A   B   C
0   0   1   2
1   3   4   5
2   6   7   8
3   9  10  11
4  12  13  14

Assign None to elements in last row turns it into NaN NaN NaN:

>>> df.ix[5,:] = None
>>> df
    A   B   C
0   0   1   2
1   3   4   5
2   6   7   8
3   9  10  11
4  12  13  14
5 NaN NaN NaN

Change two element in last column to 'nan'

>>> df.ix[:1,2] = 'nan'
>>> df
    A   B    C
0   0   1  nan
1   3   4  nan
2   6   7    8
3   9  10   11
4  12  13   14
5 NaN NaN  NaN

Now last row becomes NaN NaN None

>>> df.ix[5,:] = None
>>> df
    A   B     C
0   0   1   nan
1   3   4   nan
2   6   7     8
3   9  10    11
4  12  13    14
5 NaN NaN  None
3
  • You probably want np.nan Commented Sep 11, 2016 at 9:56
  • @DavidArenburg I could have used some other string in place of 'nan', the effect is the same Commented Sep 11, 2016 at 9:58
  • If you use df.ix[:1,2] = np.nan then, df.ix[5,:] = None will work as expected for you because C column will be a float so not sure what you mean. It seems like MaxU edited it in his accepted answer too... Commented Sep 11, 2016 at 10:05

1 Answer 1

4

It's because your dtypes are being changed after each assignment:

In [7]: df = pd.DataFrame(np.arange(15).reshape(5,3),columns=list('ABC'))

In [8]: df.dtypes
Out[8]:
A    int32
B    int32
C    int32
dtype: object

In [9]: df.loc[5,:] = None

In [10]: df.dtypes
Out[10]:
A    float64
B    float64
C    float64
dtype: object

In [11]: df.loc[:1,2] = 'nan'

after that last assignment the C column has been implicitly converted to object (string) dtype:

In [12]: df.dtypes
Out[12]:
A    float64
B    float64
C     object
dtype: object

@ayhan has written very neat answer as a comment:

I think the main reason is for numerical columns, when you insert None or np.nan, it is converted to np.nan to have a Series of type float. For objects, it takes whatever is passed (if None, it uses None; if np.nan, it uses np.nan - docs)

(c) ayhan

Here is a corresponding demo:

In [39]: df = pd.DataFrame(np.arange(15).reshape(5,3),columns=list('ABC'))

In [40]: df.loc[4, 'A'] = None

In [41]: df.loc[4, 'C'] = np.nan

In [42]: df
Out[42]:
     A   B     C
0  0.0   1   2.0
1  3.0   4   5.0
2  6.0   7   8.0
3  9.0  10  11.0
4  NaN  13   NaN

In [43]: df.dtypes
Out[43]:
A    float64
B      int32
C    float64
dtype: object

In [44]: df.loc[0, 'C'] = 'a string'

In [45]: df
Out[45]:
     A   B         C
0  0.0   1  a string
1  3.0   4         5
2  6.0   7         8
3  9.0  10        11
4  NaN  13       NaN

In [46]: df.dtypes
Out[46]:
A    float64
B      int32
C     object
dtype: object

now we can use both None and np.nan for the object dtype:

In [47]: df.loc[1, 'C'] = None

In [48]: df.loc[2, 'C'] = np.nan

In [49]: df
Out[49]:
     A   B         C
0  0.0   1  a string
1  3.0   4      None
2  6.0   7       NaN
3  9.0  10        11
4  NaN  13       NaN

UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.

Sign up to request clarification or add additional context in comments.

2 Comments

I think the main reason is for numerical columns, when you insert None or np.nan, it is converted to np.nan to have a Series of type float. For objects, it takes whatever is passed (if None, it uses None; if np.nan, it uses np.nan - docs)
@ayhan, it's a very neat answer, thank you! I've added it to the answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.