0

I am trying to compare two dates but I get the error "Can only compare identically-labeled Series objects" I also tried using iloc and .values as some other questions were answered using this method but I get various other errors using that. I am not sure what to do. The issue is where I write:

 elif group[1]["dtstart"] <= endDate

Below is my full sample code.

Note that this is not the actual data I am working with, I tried to make it very similar. I still get the same error for both (Can only compare identically-labeled Series objects),

BUT when I include the .values in this code (with the fake data) in this section like so group[1]["dtstart"] <= endDate.values I get the error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().When I include .values in the same location in the real data I get the error: "Lengths must match to compare" which was why I tried the iloc and still didn't succeed. I am not even sure if iloc or .values is the way to go and the fake data and the real data don't produce the same error when I include either, but just keeping everything as is produces the same error in both the fake and real which is

"Can only compare identically-labeled Series objects"

Any help is appreciated. Thank you!

import pandas as pd
from datetime import datetime
import numpy as np

pd.set_option('display.max_columns', None)
#Create a DataFrame
d = {
    'ID':[1,2,3,3,1,1,2,2,4,4],
   'dtstart':[pd.Timestamp('2018-01-01'), pd.Timestamp('2018-01-30'), pd.Timestamp('2018-03-01'), pd.Timestamp('2018-03-14'),
               pd.Timestamp('2018-04-08'), pd.Timestamp('2018-04-27'), pd.Timestamp('2018-07-03'), pd.Timestamp('2018-07-17'),pd.Timestamp('2018-07-17'),pd.Timestamp('2018-01-20')],
   'dtend':[pd.Timestamp('2018-01-06'), pd.Timestamp('2018-02-15'), pd.Timestamp('2018-03-05'), pd.Timestamp('2018-03-22'),
               pd.Timestamp('2018-04-15'), pd.Timestamp('2018-05-06'), pd.Timestamp('2018-07-07'), pd.Timestamp('2018-07-28'),pd.Timestamp('2018-01-18'),pd.Timestamp('2018-01-22')]}
df = pd.DataFrame(d)

grouped = df.groupby(['ID'])
grouped.apply(lambda _df: _df.sort_values(by=['dtstart']))
count=0
df_CE = pd.DataFrame(columns=['ID', 'dtstart', 'dtEnd'])
for group in grouped:
    months_enrolled=len(group)
    if count == 0:
        print("group[1][dtstart]===",group[1]["dtstart"])

        startDate = group[1]["dtstart"]
        endDate   = group[1]["dtend"] 
        count += 1
#    print("endDate==",TEST_endDate.dtypes)
    elif group[1]["dtstart"] <= endDate:
        print("yes")

1 Answer 1

1

You never set grouped.apply(lambda _df: _df.sort_values(by=['dtstart'])) to anything. If you wanted to sort it and keep it as sorted, then you should change it to

grouped = grouped.apply(lambda _df: _df.sort_values(by=['dtstart']))

That makes grouped a multiindexed DataFrame, and you will need to iterate as such. Assuming you didnt want to do that, you are getting an error because you are comparing two pd.Series of different length. I ran your code, and at line where you get that error, the comparison was made between

(4,    ID      dtend    dtstart
8   4 2018-01-18 2018-07-17
9   4 2018-01-22 2018-01-20)
>>> g2
(2,    ID      dtend    dtstart
1   2 2018-02-15 2018-01-30
6   2 2018-07-07 2018-07-03
7   2 2018-07-28 2018-07-17)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.