2

So, I'm trying to develop a personal stock screening tool, however I keep getting the "year is out of range" error while attempting to convert a column of timestamps into a readable datetime format... I'll be iterating this code over thousands of CSVs. In theory I can deal with this date issue at a later time, but the fact that I can't get it working now is quite annoying.

The code submitted below is the majority of the function I'm working with. It will navigate to the file location, check that the file isn't empty, then begin working on it.

I'm sure there are more elegant ways to navigate to the directory and grab the intended files, but I'm currently only concerned with the inability to convert the timestamps.

I've seen solutions to this issue when the timestamps were in a series, ie;

dates =['1449866579','1449866580','1449866699'...]

I can't seem to get the solution to work on a dataframe.

This is a sample of the CSV file:

1449866579,113.2100,113.2700,113.1600,113.2550,92800
1449866580,113.1312,113.2200,113.0700,113.2200,135800
1449866699,113.1150,113.1500,113.0668,113.1300,106000
1449866700,113.1800,113.2000,113.1200,113.1200,125800
1449866764,113.1200,113.1800,113.0700,113.1490,130900
1449866821,113.0510,113.1223,113.0500,113.1200,110400
1449866884,113.1000,113.1400,113.0100,113.0800,388000
1449866999,113.0900,113.1200,113.0700,113.0900,116700
1449867000,113.2000,113.2100,113.0770,113.1000,191500
1449867119,113.2250,113.2300,113.1400,113.2000,114400
1449867120,113.1300,113.2500,113.1000,113.2300,146700
1449867239,113.1300,113.1800,113.1250,113.1300,108300
1449867299,113.0930,113.1300,113.0700,113.1300,166600
1449867304,113.0850,113.1100,113.0300,113.1000,167000
1449867360,113.0300,113.1100,113.0200,113.0800,204300
1449867479,113.0700,113.0800,113.0200,113.0300,197100
1449867480,113.1600,113.1700,113.0500,113.0700,270200
1449867540,113.1700,113.2900,113.1300,113.1500,3882400
1449867600,113.1800,113.1800,113.1800,113.1800,3500

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime
import time
import os
def analysis():
    try:
        os.chdir(training_1d)
            for i in os.listdir(os.getcwd()):
                if i.endswith('.txt'):
                    if os.stat(i).st_size > 0:
                        print i+" is good for analysis..."
                        try:
                            df = pd.read_csv(i, header=None, names=['date', 'open', 'high', 'low', 'close', 'volume'])
                            print df.head()
                            print df.columns
                            df['date'] = pd.to_datetime(df['date'],unit='s')
                            print df.head()
                        except Exception, e:
                            print str(e),"Analysis Failed..."

                    elif os.stat(i).st_size == 0:
                        print i+" is an empty file"
                        continue
     except Exception, e:
         print str(e),"Something went wrong here...check: "+sys.last_traceback.tb_lineno

Here's the output error...

AAPL.txt is good for analysis...
       date     open     high      low     close  volume
    0  1449865921  113.090  113.180  113.090  113.1601   89300
    1  1449865985  113.080  113.110  113.030  113.0900   73100
    2  1449866041  113.250  113.280  113.050  113.0900  101800
    3  1449866100  113.240  113.305  113.205  113.2400  199900
    4  1449866219  113.255  113.300  113.190  113.2500   96700
    Index([u'date', u'open', u'high', u'low', u'close', u'volume'], dtype='object')

    year is out of range Analysis Failed...

Any help is greatly appreciated... Thank you.

Thanks to EdChum, as noted in the comments, the following replacement provides the necessary relief:

Replacing:

df['date'] = pd.to_datetime(df['date'],unit='s')

With:

df['date'] = pd.to_datetime(df['date'].astype(int), unit='s')
4
  • What does it think the dtype is for date? as for me pd.to_datetime(df['date'], unit='s') works what is your pandas and numpy version? Commented Dec 15, 2015 at 14:33
  • It's reading the dtype of the date as object, I recently updated both numpy and pandas to the most recent version... Double checked and pandas is 0.17.1 numpy is 1.10.1 Commented Dec 15, 2015 at 15:07
  • Unclear how this is happening but try df['date'] = pd.to_datetime(df['date'].astype(int), unit='s') Commented Dec 15, 2015 at 15:09
  • Wow, that worked!! That was simpler than expected... Thank you, thank you so very much, I can't believe I couldn't figure this out after going through the pandas documentation for hours over the past few days. Commented Dec 15, 2015 at 15:14

2 Answers 2

3

It's unclear to me why your date column is being parsed as string but to create datetime from epoch time the dtype needs to be int, then your code will work:

df['date'] = pd.to_datetime(df['date'].astype(int), unit='s')

On your data I get:

In [83]:
pd.to_datetime(df[0], unit='s')

Out[83]:
0    2015-12-11 20:42:59
1    2015-12-11 20:43:00
2    2015-12-11 20:44:59
3    2015-12-11 20:45:00
4    2015-12-11 20:46:04
5    2015-12-11 20:47:01
6    2015-12-11 20:48:04
7    2015-12-11 20:49:59
8    2015-12-11 20:50:00
9    2015-12-11 20:51:59
10   2015-12-11 20:52:00
11   2015-12-11 20:53:59
12   2015-12-11 20:54:59
13   2015-12-11 20:55:04
14   2015-12-11 20:56:00
15   2015-12-11 20:57:59
16   2015-12-11 20:58:00
17   2015-12-11 20:59:00
18   2015-12-11 21:00:00
Name: 0, dtype: datetime64[ns]
Sign up to request clarification or add additional context in comments.

4 Comments

Slightly off topic, do you think the parsing of the date column as a string will have significant impact on processing time? I do intend to run a version of this on 25,000+ csv's...
ideally you want to parse the dtype on reading, I don't know why it's being parsed as string
It's quite possible this should be posed as a new question... however, after attempting the above on the full CSV file, I'm now running into an error resulting from the data being parsed as a string... I've painstakingly scrolled through 3,000+ lines just to make sure that no oddities are in place. Could this be a pandas issue?
edit: furthermore, attempting to specify dtypes of each column individually results in error message 'cannot safely convert passed user dtype of <i4 for object dtyped data in column'
-1

Replace this line:

df['date'] = pd.to_datetime(df['date'],unit='s')

with this:

df['date'] = pd.to_datetime(int(df['date']),unit='s')

this will convert epoch timestamp to python standard timestamp.

2 Comments

New error arises: 'cannot convert the series to <type 'float'> Analysis Failed...' Thank you, not sure if it's better or worse than the year out of range error, but perhaps I can work around this...
I check to_datetime, i expect int and epoch. your data is epoch, just maybe not int.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.