I am trying to compare two dates but I get the error "Can only compare identically-labeled Series objects" I also tried using iloc and .values as some other questions were answered using this method but I get various other errors using that. I am not sure what to do. The issue is where I write:
elif group[1]["dtstart"] <= endDate
Below is my full sample code.
Note that this is not the actual data I am working with, I tried to make it very similar. I still get the same error for both (Can only compare identically-labeled Series objects),
BUT when I include the .values in this code (with the fake data) in this section like so group[1]["dtstart"] <= endDate.values I get the error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().When I include .values in the same location in the real data I get the error: "Lengths must match to compare"
which was why I tried the iloc and still didn't succeed. I am not even sure if iloc or .values is the way to go and the fake data and the real data don't produce the same error when I include either, but just keeping everything as is produces the same error in both the fake and real which is
"Can only compare identically-labeled Series objects"
Any help is appreciated. Thank you!
import pandas as pd
from datetime import datetime
import numpy as np
pd.set_option('display.max_columns', None)
#Create a DataFrame
d = {
'ID':[1,2,3,3,1,1,2,2,4,4],
'dtstart':[pd.Timestamp('2018-01-01'), pd.Timestamp('2018-01-30'), pd.Timestamp('2018-03-01'), pd.Timestamp('2018-03-14'),
pd.Timestamp('2018-04-08'), pd.Timestamp('2018-04-27'), pd.Timestamp('2018-07-03'), pd.Timestamp('2018-07-17'),pd.Timestamp('2018-07-17'),pd.Timestamp('2018-01-20')],
'dtend':[pd.Timestamp('2018-01-06'), pd.Timestamp('2018-02-15'), pd.Timestamp('2018-03-05'), pd.Timestamp('2018-03-22'),
pd.Timestamp('2018-04-15'), pd.Timestamp('2018-05-06'), pd.Timestamp('2018-07-07'), pd.Timestamp('2018-07-28'),pd.Timestamp('2018-01-18'),pd.Timestamp('2018-01-22')]}
df = pd.DataFrame(d)
grouped = df.groupby(['ID'])
grouped.apply(lambda _df: _df.sort_values(by=['dtstart']))
count=0
df_CE = pd.DataFrame(columns=['ID', 'dtstart', 'dtEnd'])
for group in grouped:
months_enrolled=len(group)
if count == 0:
print("group[1][dtstart]===",group[1]["dtstart"])
startDate = group[1]["dtstart"]
endDate = group[1]["dtend"]
count += 1
# print("endDate==",TEST_endDate.dtypes)
elif group[1]["dtstart"] <= endDate:
print("yes")