1

Given a list of values or strings, how can I detect whether these are either dates, date and times, or neither?

I have used the pandas api to infer data types but it doesn't work well with dates. See example:

import pandas as pd

def get_redshift_dtype(values):
    dtype = pd.api.types.infer_dtype(values)
    return dtype

This is the result that I'm looking for. Any suggestions on better methods?

# Should return "date"
values_1 = ['2018-10-01', '2018-02-14', '2017-08-01']

# Should return "date"
values_2 = ['2018-10-01 00:00:00', '2018-02-14 00:00:00', '2017-08-01 00:00:00']

# Should return "datetime"
values_3 = ['2018-10-01 02:13:00', '2018-02-14 11:45:00', '2017-08-01 00:00:00']

# Should return "None"
values_4 = ['123098', '213408', '801231']

2 Answers 2

2

You can write a function to return values dependent on conditions you specify:

def return_date_type(s):
    s_dt = pd.to_datetime(s, errors='coerce')
    if s_dt.isnull().any():
        return 'None'
    elif s_dt.normalize().equals(s_dt):
        return 'date'
    return 'datetime'

return_date_type(values_1)  # 'date'
return_date_type(values_2)  # 'date'
return_date_type(values_3)  # 'datetime'
return_date_type(values_4)  # 'None'

You should be aware that Pandas datetime series always include time. Internally, they are stored as integers, and if a time is not specified it will be set to 00:00:00.

Sign up to request clarification or add additional context in comments.

Comments

0

Here's something that'll give you exactly what you asked for using re

import re

classify_dict = {
    'date': '^\d{4}(-\d{2}){2}$',
    'date_again': '^\d{4}(-\d{2}){2} 00:00:00$',
    'datetime': '^\d{4}(-\d{2}){2} \d{2}(:\d{2}){2}$',
}

def classify(mylist):
    key = 'None'
    for k, v in classify_dict.items():
        if all([bool(re.match(v, e)) for e in mylist]):
            key = k
            break
    if key == 'date_again':
        key = 'date'
    return key

classify(values_2)
>>> 'date'

The checking is done iteratively using regex and it tries to match all items of a list. Only if all items are matched will the key be returned. This works for all of your example lists you've given.

For now, the regex string does not check for numbers outside certain range, e.g (25:00:00) but that would be relatively straightforward to implement.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.