Comparing two dates in grouped variable

Question

I am trying to compare two dates but I get the error "Can only compare identically-labeled Series objects" I also tried using iloc and .values as some other questions were answered using this method but I get various other errors using that. I am not sure what to do. The issue is where I write:

 elif group[1]["dtstart"] <= endDate

Below is my full sample code.

Note that this is not the actual data I am working with, I tried to make it very similar. I still get the same error for both (Can only compare identically-labeled Series objects),

BUT when I include the .values in this code (with the fake data) in this section like so group[1]["dtstart"] <= endDate.values I get the error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().When I include .values in the same location in the real data I get the error: "Lengths must match to compare" which was why I tried the iloc and still didn't succeed. I am not even sure if iloc or .values is the way to go and the fake data and the real data don't produce the same error when I include either, but just keeping everything as is produces the same error in both the fake and real which is

"Can only compare identically-labeled Series objects"

Any help is appreciated. Thank you!

import pandas as pd
from datetime import datetime
import numpy as np

pd.set_option('display.max_columns', None)
#Create a DataFrame
d = {
    'ID':[1,2,3,3,1,1,2,2,4,4],
   'dtstart':[pd.Timestamp('2018-01-01'), pd.Timestamp('2018-01-30'), pd.Timestamp('2018-03-01'), pd.Timestamp('2018-03-14'),
               pd.Timestamp('2018-04-08'), pd.Timestamp('2018-04-27'), pd.Timestamp('2018-07-03'), pd.Timestamp('2018-07-17'),pd.Timestamp('2018-07-17'),pd.Timestamp('2018-01-20')],
   'dtend':[pd.Timestamp('2018-01-06'), pd.Timestamp('2018-02-15'), pd.Timestamp('2018-03-05'), pd.Timestamp('2018-03-22'),
               pd.Timestamp('2018-04-15'), pd.Timestamp('2018-05-06'), pd.Timestamp('2018-07-07'), pd.Timestamp('2018-07-28'),pd.Timestamp('2018-01-18'),pd.Timestamp('2018-01-22')]}
df = pd.DataFrame(d)

grouped = df.groupby(['ID'])
grouped.apply(lambda _df: _df.sort_values(by=['dtstart']))
count=0
df_CE = pd.DataFrame(columns=['ID', 'dtstart', 'dtEnd'])
for group in grouped:
    months_enrolled=len(group)
    if count == 0:
        print("group[1][dtstart]===",group[1]["dtstart"])

        startDate = group[1]["dtstart"]
        endDate   = group[1]["dtend"] 
        count += 1
#    print("endDate==",TEST_endDate.dtypes)
    elif group[1]["dtstart"] <= endDate:
        print("yes")

alexshchep · Accepted Answer · 2019-02-09 17:55:47Z

1

You never set grouped.apply(lambda _df: _df.sort_values(by=['dtstart'])) to anything. If you wanted to sort it and keep it as sorted, then you should change it to

grouped = grouped.apply(lambda _df: _df.sort_values(by=['dtstart']))

That makes grouped a multiindexed DataFrame, and you will need to iterate as such. Assuming you didnt want to do that, you are getting an error because you are comparing two pd.Series of different length. I ran your code, and at line where you get that error, the comparison was made between

(4,    ID      dtend    dtstart
8   4 2018-01-18 2018-07-17
9   4 2018-01-22 2018-01-20)
>>> g2
(2,    ID      dtend    dtstart
1   2 2018-02-15 2018-01-30
6   2 2018-07-07 2018-07-03
7   2 2018-07-28 2018-07-17)

answered Feb 9, 2019 at 17:55

alexshchep

2681 silver badge16 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Comparing two dates in grouped variable

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related