0

I have a dataframe (df) as follows:

d = {'Item':['x','y','z','x','z'], 'Count' : ['10', '11', '12', '9','10'], 'Date' : pd.to_datetime(['2018-8-14', '2018-8-14', '2018-8-14', '2018-8-13','2018-8-13'])}

df= pd.DataFrame(data=d)


Item       Count        Date
x          10           2018-08-14
y          11           2018-08-14
z          12           2018-08-14
x          9            2018-08-13
x          9            2018-08-12
z          10           2018-08-13

I want to compare rows based on the following: For each item, compare the count of max(Date) with max(Date) - 1.

Meaning it should compare the count for item x, for dates 2018-08-13 and 2018-08-14. If the count for max(Date) is greater then it should select that row and store it in a different dataframe.

Same for item z, it should compare the counts for dates 2018-08-13 and 2018-08-14 and because the count is greater it should select the row for item z with count 12.

Output: df2

Item     Count     Date
x        10        2018-08-14
z        12        2018-08-14

I've tried the following:

if ((df.Item == df.Item) and
        (df.Date > df.Date) and (df.Count > df.Count)):
    print("we met the conditions!")

2 Answers 2

1

Using merge with key Item

df.loc[df.reset_index().merge(df,on='Item').loc[lambda x : (x['Count_x']>x['Count_y'])&(x['Date_x']>x['Date_y'])]['index'].unique()]
Out[49]: 
  Item  Count       Date
0    x     10 2018-08-14
2    z     12 2018-08-14
Sign up to request clarification or add additional context in comments.

5 Comments

Thank you for this, but I am getting an empty data frame with column names on running this line of code.
@Sravee df.Date=pd.to_datetime(df.Date) adding this >
I had added the to_datetime function while defining the dataframe itself. Will that make a difference?
@Sravee string can not provide the comparison .
I was able to breakdown your condition into a basic version, as I am not yet familiar with lambda functions. Take a look!
0

Thanks to @Wen, I was able to break down his step in to a bit more basic version.

create temporary data set that has values for max(date) and max(date)-1

t_day = df[df.Date == df.Date.max()]
y_day = df[df.Date == df.Date.max() - pd.to_timedelta(1, unit='d')]

merge temporary dataframes to create a master temp

temp = t_day.merge(y_day, on = 'Item', how='outer')
temp = temp.dropna()

Defining function to create the required condition

def func(row):
    if (int(row['Count_x']) > int(row['Count_y']) & (row['Date_x'] > row['Date_y'])):
        return '1'
    else:
        return '0'
temp['cond'] = temp.apply(func, axis=1)

Dropping unused columns

temp.drop(['Count_y','Date_y','cond'],axis = 1, inplace=True)

print(temp)

Now it returns:

Count_x      Date_x     Item   
10         2018-08-14    x     
12         2018-08-14    z    

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.