0

I would like to select data between two dates. The date's format is in the code I copied. I have seen a lot of codes for this, but couldn't find anything with this format.

Dataset = Data_have[
    (Data_have['OrderDate'] >= "1SEP2007:00:00:00") & 
    (Data_have['OrderDate'] <= "31MAR2008:00:00:00")
]

Thanks

1
  • Pintang, how did you do that? Commented Oct 11, 2017 at 22:16

1 Answer 1

1

Assuming you are using pandas, and that data_have is a DataFrame, the problem here is that pandas does not automatically recognized strings like '1SEP2007:00:00:00' as date-strings:

But it can if you give it a little help:

import pandas as pd

data_have = pd.DataFrame(
    {'OrderDate': pd.date_range('2007-1-1', '2009-1-1', freq='M')})

start, end = pd.to_datetime(["1SEP2007:00:00:00", "31MAR2008:00:00:00"], 
                            format='%d%b%Y:%H:%M:%S')
dataset = data_have[
    (data_have['OrderDate'] >= start) & 
    (data_have['OrderDate'] <= end)
]
print(dataset)

prints

    OrderDate
8  2007-09-30
9  2007-10-31
10 2007-11-30
11 2007-12-31
12 2008-01-31
13 2008-02-29
14 2008-03-31

Above, I'm also assuming that OrderDate contains values of dtype datetime64[ns] (parsed dates, as opposed to date strings). If OrderDate contains date-strings, then you'll need to parse them too:

data_have['OrderDate'] = pd.to_datetime(data_have['OrderDate'], 
                                        format='%d%b%Y:%H:%M:%S')

Pandas uses dateutil to parse date-strings. You can test if a date-string is automatically parsable by pandas by passing it to pd.Timestamp:

In [45]: pd.Timestamp("1SEP2007:00:00:00")
ValueError: could not convert string to Timestamp

whereas, for example, "1 SEP2007 00:00:00" is automatically parsable:

In [49]: pd.Timestamp("1 SEP2007 00:00:00")
Out[49]: Timestamp('2007-09-01 00:00:00')
Sign up to request clarification or add additional context in comments.

8 Comments

Might be useful to also include: data_have['OrderDate'] = data_have.apply(lambda x:pd.to_datetime(x['OrderDate'], format='%d%b%Y:%H:%M:%S'), axis=1)
@cwallenpoole: Thanks for the suggestion. I've added something similar, but using pd.to_datetime instead of apply.
Good deal. I'm new to Pandas so I didn't know about that shortcut.
Thank you very much for such a comprehensive information and the link on Datetime dtypes. I was really struggling to format the column "orderdate". I used the code below as you suggested then all problem solved: data_have['OrderDate'] = pd.to_datetime(data_have['OrderDate'], format='%d%b%Y:%H:%M:%S') I had 18484 rows that I needed for predictive modelling. Your quick reply really helped. Thanks again.
Sorry to bother you again, how did you select the codes in the highlighted lines?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.