7

I think that this has to be a failure of pandas, having a pandas Series (v.18.1 and 19 too), if I assign a date to the Series, the first time it is added as int (error), the second time it is added as datetime(correct), I can not understand the reason.

For instance with this code:

import datetime as dt
import pandas as pd
series = pd.Series(list('abc'))
date = dt.datetime(2016, 10, 30, 0, 0)
series["Date_column"] =date
print("The date is {} and the type is {}".format(series["Date_column"], type(series["Date_column"])))
series["Date_column"] =date
print("The date is {} and the type is {}".format(series["Date_column"], type(series["Date_column"])))

The output is:

The date is 1477785600000000000 and the type is <class 'int'>
The date is 2016-10-30 00:00:00 and the type is <class 'datetime.datetime'>

As you can see, the first time it always sets the value as int instead of datetime.

could someone help me?, Thank you very much in advance, Javi.

2
  • 1
    I don't know what causes this behaviour, but you should be careful when adding a date to a strings column. You are aware that you are adding a row, not a column, right? Commented Nov 21, 2016 at 9:14
  • 1
    This smells like a bug to me, Series support mixed dtypes so it looks like the datetime is being coerced to int on the initial assignment but then overwriting the same index label position yields the expected behaviour. I'd post an issue on github Commented Nov 21, 2016 at 9:34

1 Answer 1

1

The reason for this is that series is an 'object' type and the columns of a pandas DataFrame (or a Series) are homogeneously of type. You can inspect this with dtype (or DataFrame.dtypes):

series = pd.Series(list('abc'))
series
Out[3]:
0    a
1    b
2    c
dtype: object

In [15]: date = dt.datetime(2016, 10, 30, 0, 0)
date
Out[15]: datetime.datetime(2016, 10, 30, 0, 0)

In [18]: print(date)
2016-10-30 00:00:00

In [17]: type(date)
Out[17]: datetime.datetime

In [19]: series["Date_column"] = date
In [20]: series

Out[20]:
0                                a
1                                b
2                                c
Date_column    1477785600000000000
dtype: object

In [22]: series.dtype

Out[22]: dtype('O')

Only the generic 'object' dtype can hold any python object (in your case inserting a datetime.datetime object into the Series).

Moreover, Pandas Series are based on Numpy Arrays, which are not mixed types and defeats the purpose of using the computational benefit of Pandas DataFrames and Series or Numpy.

Could you use a python list() instead? or a DataFrame()?

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.