3

I am reading a csv using pandas

str,date,float,time,datetime
a,10/11/19,1.1,10:30:00,10/11/19 10:30
b,10/11/19,1.2,10:00:00,10/11/19 10:30
c,10/11/19,1.3,11:10:11,10/11/19 10:30
df = pd.read_csv(file)

Now my business requirement is that I want to tell which column is pure date field, pure time field, or which is complete datetime. For particular column my code is:

try:
                    dt = pd.to_datetime(df[col])
                    dates = [obj.date() for obj in dt]
                    times = [obj.time() for obj in dt]

                    if dates and (set(times) == set([datetime.time(0, 0)])):
                        # Its a pure date field
                    elif <something>:
                       # Its a  pure time field
                    else:
                       #Its a Datetime field


except:
            # its not a datefield

problem with my code is when there is only time field, pd.to_datetime is taking default today's date so I am not able to differentiate it with datetime. Is there any easy solution? Please help me fill "something" in code above

2
  • 1
    Please add some of that data too, so we can try to reproduce your issue. Commented Nov 13, 2019 at 7:11
  • I have added the sample data @AKX Commented Nov 13, 2019 at 7:37

1 Answer 1

6

If want test times, pandas by default use today dates, so possible solution is test them with Series.dt.date, Timestamp.date and Series.all if all values of column match.

Also added another solution for test dates - test if same values after removed times by Series.dt.floor:

df = pd.DataFrame({'a':['2019-01-01 12:23:10',
                        '2019-01-02 12:23:10'],
                   'b':['2019-01-01',
                        '2019-01-02'],
                   'c':['12:23:10',
                        '15:23:10'],
                   'd':['a','b']})
print (df)
                     a           b         c  d
0  2019-01-01 12:23:10  2019-01-01  12:23:10  a
1  2019-01-02 12:23:10  2019-01-02  15:23:10  b

def check(col):
    try:
        dt = pd.to_datetime(df[col])

        if (dt.dt.floor('d') == dt).all():
            return ('Its a pure date field')
        elif (dt.dt.date == pd.Timestamp('now').date()).all():
            return ('Its a pure time field')
        else:
            return ('Its a Datetime field') 
    except:
        return ('its not a datefield')


print (check('a'))
print (check('b'))
print (check('c'))
print (check('d'))
Its a Datetime field
Its a pure date field
Its a pure time field
its not a datefield

Another idea is also test if numeric columns and by default return not numeric for prevent casting numeric to datetimes, but if possible all datetimes contains only todays dates (f column) then test for times is different with Series.str.contains for match pattern HH:MM:SS or H:MM:SS:

df = pd.DataFrame({'a':['2019-01-01 12:23:10',
                        '2019-01-02'],
                   'b':['2019-01-01',
                        '2019-01-02'],
                   'c':['12:23:10',
                        '15:23:10'],
                   'd':['a','b'],
                   'e':[1,2],
                  'f':['2019-11-13 12:23:10',
                       '2019-11-13'],})
print (df)
                     a           b         c  d  e                    f
0  2019-01-01 12:23:10  2019-01-01  12:23:10  a  1  2019-11-13 12:23:10
1           2019-01-02  2019-01-02  15:23:10  b  2           2019-11-13

def check(col):
    if np.issubdtype(df[col].dtype, np.number):
        return ('its not a datefield')

    try:
        dt = pd.to_datetime(df[col])
        if (dt.dt.floor('d') == dt).all():
            return ('Its a pure date field')
        elif df[col].str.contains(r"^\d{1,2}:\d{2}:\d{2}$").all():
            return ('Its a pure time field')
        else:
            return ('Its a Datetime field') 
    except:
        return ('its not a datefield')


print (check('a'))
print (check('b'))
print (check('c'))
print (check('d'))
print (check('e'))
print (check('f'))
Its a Datetime field
Its a pure date field
Its a pure time field
its not a datefield
its not a datefield
Its a Datetime field
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for the ans, it is working. My doubt is if my "datetime" columns contains today's date with some time , then will it return ('Its a pure time field'). Am I right?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.