Compare pandas dataframe rows based on condition

Question

I have a dataframe (df) as follows:

d = {'Item':['x','y','z','x','z'], 'Count' : ['10', '11', '12', '9','10'], 'Date' : pd.to_datetime(['2018-8-14', '2018-8-14', '2018-8-14', '2018-8-13','2018-8-13'])}

df= pd.DataFrame(data=d)


Item       Count        Date
x          10           2018-08-14
y          11           2018-08-14
z          12           2018-08-14
x          9            2018-08-13
x          9            2018-08-12
z          10           2018-08-13

I want to compare rows based on the following: For each item, compare the count of max(Date) with max(Date) - 1.

Meaning it should compare the count for item x, for dates 2018-08-13 and 2018-08-14. If the count for max(Date) is greater then it should select that row and store it in a different dataframe.

Same for item z, it should compare the counts for dates 2018-08-13 and 2018-08-14 and because the count is greater it should select the row for item z with count 12.

Output: df2

Item     Count     Date
x        10        2018-08-14
z        12        2018-08-14

I've tried the following:

if ((df.Item == df.Item) and
        (df.Date > df.Date) and (df.Count > df.Count)):
    print("we met the conditions!")

BENY · Accepted Answer · 2018-08-14 15:37:30Z

1

Using merge with key Item

df.loc[df.reset_index().merge(df,on='Item').loc[lambda x : (x['Count_x']>x['Count_y'])&(x['Date_x']>x['Date_y'])]['index'].unique()]
Out[49]: 
  Item  Count       Date
0    x     10 2018-08-14
2    z     12 2018-08-14

answered Aug 14, 2018 at 15:37

BENY

324k22 gold badges176 silver badges250 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Sravee Over a year ago

Thank you for this, but I am getting an empty data frame with column names on running this line of code.

BENY Over a year ago

@Sravee df.Date=pd.to_datetime(df.Date) adding this >

Sravee Over a year ago

I had added the to_datetime function while defining the dataframe itself. Will that make a difference?

BENY Over a year ago

@Sravee string can not provide the comparison .

Sravee Over a year ago

I was able to breakdown your condition into a basic version, as I am not yet familiar with lambda functions. Take a look!

Sravee · Accepted Answer · 2018-08-14 18:18:37Z

Thanks to @Wen, I was able to break down his step in to a bit more basic version.

create temporary data set that has values for max(date) and max(date)-1

t_day = df[df.Date == df.Date.max()]
y_day = df[df.Date == df.Date.max() - pd.to_timedelta(1, unit='d')]

merge temporary dataframes to create a master temp

temp = t_day.merge(y_day, on = 'Item', how='outer')
temp = temp.dropna()

Defining function to create the required condition

def func(row):
    if (int(row['Count_x']) > int(row['Count_y']) & (row['Date_x'] > row['Date_y'])):
        return '1'
    else:
        return '0'
temp['cond'] = temp.apply(func, axis=1)

Dropping unused columns

temp.drop(['Count_y','Date_y','cond'],axis = 1, inplace=True)

print(temp)

Now it returns:

Count_x      Date_x     Item   
10         2018-08-14    x     
12         2018-08-14    z

Collectives™ on Stack Overflow

Compare pandas dataframe rows based on condition

2 Answers 2

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related