Timestamp conversion to datetime Python, Pandas

Question

So, I'm trying to develop a personal stock screening tool, however I keep getting the "year is out of range" error while attempting to convert a column of timestamps into a readable datetime format... I'll be iterating this code over thousands of CSVs. In theory I can deal with this date issue at a later time, but the fact that I can't get it working now is quite annoying.

The code submitted below is the majority of the function I'm working with. It will navigate to the file location, check that the file isn't empty, then begin working on it.

I'm sure there are more elegant ways to navigate to the directory and grab the intended files, but I'm currently only concerned with the inability to convert the timestamps.

I've seen solutions to this issue when the timestamps were in a series, ie;

dates =['1449866579','1449866580','1449866699'...]

I can't seem to get the solution to work on a dataframe.

This is a sample of the CSV file:

1449866579,113.2100,113.2700,113.1600,113.2550,92800
1449866580,113.1312,113.2200,113.0700,113.2200,135800
1449866699,113.1150,113.1500,113.0668,113.1300,106000
1449866700,113.1800,113.2000,113.1200,113.1200,125800
1449866764,113.1200,113.1800,113.0700,113.1490,130900
1449866821,113.0510,113.1223,113.0500,113.1200,110400
1449866884,113.1000,113.1400,113.0100,113.0800,388000
1449866999,113.0900,113.1200,113.0700,113.0900,116700
1449867000,113.2000,113.2100,113.0770,113.1000,191500
1449867119,113.2250,113.2300,113.1400,113.2000,114400
1449867120,113.1300,113.2500,113.1000,113.2300,146700
1449867239,113.1300,113.1800,113.1250,113.1300,108300
1449867299,113.0930,113.1300,113.0700,113.1300,166600
1449867304,113.0850,113.1100,113.0300,113.1000,167000
1449867360,113.0300,113.1100,113.0200,113.0800,204300
1449867479,113.0700,113.0800,113.0200,113.0300,197100
1449867480,113.1600,113.1700,113.0500,113.0700,270200
1449867540,113.1700,113.2900,113.1300,113.1500,3882400
1449867600,113.1800,113.1800,113.1800,113.1800,3500

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import time
import os
def analysis():
    try:
        os.chdir(training_1d)
            for i in os.listdir(os.getcwd()):
                if i.endswith('.txt'):
                    if os.stat(i).st_size > 0:
                        print i+" is good for analysis..."
                        try:
                            df = pd.read_csv(i, header=None, names=['date', 'open', 'high', 'low', 'close', 'volume'])
                            print df.head()
                            print df.columns
                            df['date'] = pd.to_datetime(df['date'],unit='s')
                            print df.head()
                        except Exception, e:
                            print str(e),"Analysis Failed..."

                    elif os.stat(i).st_size == 0:
                        print i+" is an empty file"
                        continue
     except Exception, e:
         print str(e),"Something went wrong here...check: "+sys.last_traceback.tb_lineno

Here's the output error...

AAPL.txt is good for analysis...
       date     open     high      low     close  volume
    0  1449865921  113.090  113.180  113.090  113.1601   89300
    1  1449865985  113.080  113.110  113.030  113.0900   73100
    2  1449866041  113.250  113.280  113.050  113.0900  101800
    3  1449866100  113.240  113.305  113.205  113.2400  199900
    4  1449866219  113.255  113.300  113.190  113.2500   96700
    Index([u'date', u'open', u'high', u'low', u'close', u'volume'], dtype='object')

    year is out of range Analysis Failed...

Any help is greatly appreciated... Thank you.

Thanks to EdChum, as noted in the comments, the following replacement provides the necessary relief:

Replacing:

df['date'] = pd.to_datetime(df['date'],unit='s')

With:

df['date'] = pd.to_datetime(df['date'].astype(int), unit='s')

What does it think the dtype is for date? as for me pd.to_datetime(df['date'], unit='s') works what is your pandas and numpy version? — EdChum
– EdChum, Commented Dec 15, 2015 at 14:33
It's reading the dtype of the date as object, I recently updated both numpy and pandas to the most recent version... Double checked and pandas is 0.17.1 numpy is 1.10.1 — WilliamP
– WilliamP, Commented Dec 15, 2015 at 15:07
Unclear how this is happening but try df['date'] = pd.to_datetime(df['date'].astype(int), unit='s') — EdChum
– EdChum, Commented Dec 15, 2015 at 15:09
Wow, that worked!! That was simpler than expected... Thank you, thank you so very much, I can't believe I couldn't figure this out after going through the pandas documentation for hours over the past few days. — WilliamP
– WilliamP, Commented Dec 15, 2015 at 15:14

EdChum · Accepted Answer · 2015-12-15 15:15:50Z

3

It's unclear to me why your date column is being parsed as string but to create datetime from epoch time the dtype needs to be int, then your code will work:

df['date'] = pd.to_datetime(df['date'].astype(int), unit='s')

On your data I get:

In [83]:
pd.to_datetime(df[0], unit='s')

Out[83]:
0    2015-12-11 20:42:59
1    2015-12-11 20:43:00
2    2015-12-11 20:44:59
3    2015-12-11 20:45:00
4    2015-12-11 20:46:04
5    2015-12-11 20:47:01
6    2015-12-11 20:48:04
7    2015-12-11 20:49:59
8    2015-12-11 20:50:00
9    2015-12-11 20:51:59
10   2015-12-11 20:52:00
11   2015-12-11 20:53:59
12   2015-12-11 20:54:59
13   2015-12-11 20:55:04
14   2015-12-11 20:56:00
15   2015-12-11 20:57:59
16   2015-12-11 20:58:00
17   2015-12-11 20:59:00
18   2015-12-11 21:00:00
Name: 0, dtype: datetime64[ns]

answered Dec 15, 2015 at 15:15

EdChum

397k204 gold badges836 silver badges583 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

WilliamP Over a year ago

Slightly off topic, do you think the parsing of the date column as a string will have significant impact on processing time? I do intend to run a version of this on 25,000+ csv's...

EdChum Over a year ago

ideally you want to parse the dtype on reading, I don't know why it's being parsed as string

WilliamP Over a year ago

It's quite possible this should be posed as a new question... however, after attempting the above on the full CSV file, I'm now running into an error resulting from the data being parsed as a string... I've painstakingly scrolled through 3,000+ lines just to make sure that no oddities are in place. Could this be a pandas issue?

WilliamP Over a year ago

edit: furthermore, attempting to specify dtypes of each column individually results in error message 'cannot safely convert passed user dtype of <i4 for object dtyped data in column'

Ali Nikneshan · Accepted Answer · 2015-12-15 14:28:52Z

-1

Replace this line:

df['date'] = pd.to_datetime(df['date'],unit='s')

with this:

df['date'] = pd.to_datetime(int(df['date']),unit='s')

this will convert epoch timestamp to python standard timestamp.

edited Dec 15, 2015 at 14:28

answered Dec 15, 2015 at 14:22

Ali Nikneshan

3,51231 silver badges40 bronze badges

2 Comments

WilliamP Over a year ago

New error arises: 'cannot convert the series to <type 'float'> Analysis Failed...' Thank you, not sure if it's better or worse than the year out of range error, but perhaps I can work around this...

Ali Nikneshan Over a year ago

I check to_datetime, i expect int and epoch. your data is epoch, just maybe not int.

Collectives™ on Stack Overflow

Timestamp conversion to datetime Python, Pandas

2 Answers 2

4 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related