2

I have a dataframe having multiple columns. One of the column is having dates of format (%m/%d/%Y) or having null values. I have to apply a check to make sure that date column contains date in correct format (mentioned above).

What I am trying to do is:

pd.to_datetime(df['DOB'], format='%m/%d/%Y', errors='coerce').all(skipna=True)

to check it has correct date format and empty values can be ignored, but I am getting this error,

TypeError: invalid_op() got an unexpected keyword argument 'skipna'

So, kindly let me know how to do it or what other logic I can apply ?

EDIT 1: Suppose data having 3 DOBs and 1 null value:

data = {"Name": ["James", "Alice", "Phil", "Jacob"], "DOB": ["07-01-1997", "06-02-1995", "", "03-07-2002"]}

Modifying DOB column to convert date as per my format and replacing empty fields with NaN:

df['DOB']=pd.to_datetime(df['DOB']).apply(lambda cell: cell.strftime(DATE_IN_MDY) if not pd.isnull(cell) else np.nan)

And in this case I want result to be true.

1 Answer 1

2

Idea is compare for empty strings OR (|) for missing values by Series.isna and then compare by possible added misisng values by parameter errors='coerce' in to_datetime:

data = {"Name": ["James", "Alice", "Phil", "Jacob"],
            "DOB": ["07-01-1997", "06-02-1995", "", "03-07-2002"]}

df = pd.DataFrame(data)

m1 = df['DOB'].eq('') | df['DOB'].isna()
m2 = pd.to_datetime(df['DOB'], errors='coerce').isna()

print (m1.eq(m2).all())
True

Sample for return False, because wrong datetime:

data = {"Name": ["James", "Alice", "Phil", "Jacob"],
            "DOB": ["07-01-1997", "06-02-1995", "", "03-97-2002"]}

df = pd.DataFrame(data)

m1 = df['DOB'].eq('') | df['DOB'].isna()
m2 = pd.to_datetime(df['DOB'], errors='coerce').isna()

print (m1.eq(m2).all())
False
Sign up to request clarification or add additional context in comments.

4 Comments

The column also contains some null values which are intentionally there, null values are not the missing values and that's why in this case above solution will give false everytime.
Data having 3 DOBs and 1 null value: data = {"Name": ["James", "Alice", "Phil", "Jacob"], "DOB": ["07-01-1997", "06-02-1995", "", "03-07-2002"]} Modifying DOB column to convert date as per my format and replacing empty fields with NaN: df['DOB']=pd.to_datetime(df['DOB']).apply(lambda cell: cell.strftime(DATE_IN_MDY) if not pd.isnull(cell) else np.nan) And in this case I want result to be true.
It should be false when date separator is different from what is required, format is not correct or if it contains any other value except date and null.
@TechFukrey - OK, please check now, I hope working like you need.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.