I have the following code:
datetime_const = datetime(2021, 3, 31)
tmp_df1['datetime2'] = pd.to_datetime(tmp_df1['datetime1'], format='%Y-%m-%d')
tmp_df1['test_col_1'] = (tmp_df1['value1'] < 0.0002) & (tmp_df1['datetime2'] < (datetime_const + pd.DateOffset(months=12)))
tmp_df1['test_col_2'] = (tmp_df1['value1'] >= 0.0002) & ((((tmp_df1['datetime2'] - datetime_const ).dt.days/365)*tmp_df1['value1']) < 0.0002)
tmp_df1['test_col_3'] = datetime_const + pd.DateOffset(months=12)
tmp_df1['test_col_4'] = datetime_const + pd.to_timedelta(((0.0002/tmp_df1['value1'])*365).round(), unit='D')
tmp_df1['test_col_5'] = tmp_df1['datetime2']
tmp_df1['datetime3'] = np.select(
[
(tmp_df1['value1'] < 0.0002) & (tmp_df1['datetime2'] < (datetime_const + pd.DateOffset(months=12))),
(tmp_df1['value1'] >= 0.0002) & ((((tmp_df1['datetime2'] - datetime_const ).dt.days/365)*tmp_df1['value1']) < 0.0002)
],
[
datetime_const + pd.DateOffset(months=12),
datetime_const + pd.to_timedelta(((0.0002/tmp_df1['value1'])*365).round(), unit='D')
],
default=tmp_df1['datetime2']
)
datetime1 is an object dtype, so i converted it to datetime64, as datetime2 is assigned as.
value1 is a float dtype column with a bunch of decimal numbers, it does have NaNs.
I created test_col_1 to test_col_5 to check the individual conditions and choices within my np.select function, they all seem correct when assigned as individual df columns.
However, my datetime3 column assignment, from the np.select function, returns some weird object dtype large numbers, like 160000000000. I would expect it to return either a datetime64 value from one of the two choices, or the default datetime2 column value.
Please see the sample .info and df rows below:
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 datetime2 26558 non-null datetime64[ns]
1 value1 25438 non-null float64
2 test_col_1 26558 non-null bool
3 test_col_2 26558 non-null bool
4 test_col_3 26558 non-null datetime64[ns]
5 test_col_4 25438 non-null datetime64[ns]
6 test_col_5 26558 non-null datetime64[ns]
7 datetime3 26558 non-null object
dtypes: bool(2), datetime64[ns](4), float64(1), object(1)
memory usage: 1.5+ MB
datetime2 value1 test_col_1 test_col_2 test_col_3 test_col_4 test_col_5 datetime3
0 2021-06-30 0.00058 False True 2022-03-31 2021-08-05 2021-06-30 1628121600000000000
1 2022-03-31 0.00044 False False 2022-03-31 2021-09-13 2022-03-31 1648684800000000000
2 2024-06-07 0.00860 False False 2022-03-31 2021-04-08 2024-06-07 1717718400000000000
3 2021-09-30 0.00867 False False 2022-03-31 2021-04-08 2021-09-30 1632960000000000000
4 2021-08-31 0.00144 False False 2022-03-31 2021-05-21 2021-08-31 1630368000000000000
5 2021-08-31 0.00144 False False 2022-03-31 2021-05-21 2021-08-31 1630368000000000000
6 2021-04-08 0.00474 False True 2022-03-31 2021-04-15 2021-04-08 1618444800000000000
7 2023-10-01 0.11506 False False 2022-03-31 2021-04-01 2023-10-01 1696118400000000000
8 2023-09-29 0.12067 False False 2022-03-31 2021-04-01 2023-09-29 1695945600000000000
9 2021-05-31 0.02508 False False 2022-03-31 2021-04-03 2021-05-31 1622419200000000000
I am completely baffled by this behavior, please enlighten me!
Thank you all in advance!