0

I want to create a function that flags the rows based on certain conditions.

It's not working and I think it's caused by the format of the columns.

The function is:

tolerance=5

def pmm_m2_rag(data):

    if  data['m2'] == data['TP_M2'] and data['m6p'] + pd.to_timedelta(tolerance,unit='D') <= data['latedate']:
        return 'GREEN'


    elif data['m2']!= data['TP_M2'] and data['m6p'] + pd.to_timedelta(tolerance,unit='D') < data['latedate']:
        return 'AMBER'


    elif data['m2']!= None and data['m6p'] + pd.to_timedelta(tolerance,unit='D') > data['latedate']:
        return 'RED'

The dataframe is :

                m2       TP_M2         m6p          latedate         
0       2019-11-28  2019-10-29  2020-02-21        2020-02-25       
1       2019-11-28  2019-10-29  2020-02-21        2020-02-25       
2       2019-11-28  2019-11-28  2020-02-09        2020-02-17       
3       2019-11-28  2019-11-28  2020-02-29        2020-02-17

The datatype is:

m2                  object
TP_M2               object
m6p                 object
latedate    object
dtype: object

Expected output:

                m2       TP_M2         m6p          latedate         RAG
0       2019-11-28  2019-10-29  2020-02-21        2020-02-25       AMBER
1       2019-11-28  2019-10-29  2020-02-21        2020-02-25       AMBER
2       2019-11-28  2019-11-28  2020-02-09        2020-02-17       GREEN
3       2019-11-28  2019-11-28  2020-02-29        2020-02-17         RED
5
  • i debugged my program and seems that the error come from this line: data['m6p'] + pd.to_timedelta(tolerance,unit='D') < data['latedeliverydate'] or data['latedeliverydate'] > data['m6p'] + pd.to_timedelta(tolerance,unit='D') ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Commented Dec 11, 2019 at 12:10
  • data is a complete dataframe, right? The error message tells you, that it is nonsense to evaluate a dataframe in an if-statement. What you most likely want is to apply pmm_m2_rag to each row in the dataframe. Commented Dec 11, 2019 at 12:19
  • data = data.applymap(pd.to_datetime) Commented Dec 11, 2019 at 12:23
  • yes data is a complete dataframe Commented Dec 11, 2019 at 12:41
  • Your conditions are clearly wrong, take a look at the amber one if you just test the filter only the third one is right Commented Dec 12, 2019 at 0:18

2 Answers 2

1

First of all, something in your code seems to be wrong. This

... unit='D') <= data['latedate'] < data['m6p'] ...

chaining of comparisons is definitely wrong.

Then in your conditon for AMBER the two clauses of you or are identical. This also makes no sense.

Apart from that you should convert the datatypes of your columns to type datetime. E.g. by:

data = data.applymap(pd.to_datetime)

This depends on what the datatype is when you read from your database.

After that, there are basically two options. You can write a function that takes a single row, calculates the value and returns the color. Then apply this function row by row.

The other (faster and preferrable) option is to calculate the column 'RAG' in parallel.

This can be done by using numpy.where with the conditions you have written above. Please note that and between datafram columns has to be written as &; or as |.

Something like this should work:

import numpy as np
def pmm_m2_rag(data):
    green_filter = (data.m2 == data.TP_M2) & \
        (data.m6p + pd.to_timedelta(tolerance,unit='D') <= data.latedate)

    amber_filter = (data.m2 != data.TP_M2) & \
        (data.m6p + pd.to_timedelta(tolerance,unit='D') < data.latedate) | \
        (data.latedate > data.m6p + pd.to_timedelta(tolerance,unit='D'))

    red_filter = (data.m2 != pd.NaT) & \
        (data.m6p + pd.to_timedelta(tolerance,unit='D') > data.latedate)

    data['RAG'] = np.where(green_filter, 'GREEN', np.where(amber_filter, 'AMBER', np.where(red_filter, 'RED', '')))

The syntax of np.where is

np.where(<CONDITION>, true-clause, false-clause)
Sign up to request clarification or add additional context in comments.

3 Comments

Indeed there was a mistake on my condition which i changed in my code.
I changed the code but it's gives only the Red condition
I added the code i use updated with the condition as an answer
1

one option it to convert object into datetime before doing the datetime comparisons as below

from datetime import datetime
tolerance=5

def pmm_m2_rag(data):
    #m2 = datetime.strptime(data['m2'],'%Y-%m-%d')
    #m6p = datetime.strptime(data['m6p'],'%Y-%m-%d')
    #latedate = datetime.strptime(data['latedate'],'%Y-%m-%d')
    #TP_M2 = datetime.strptime(data['TP_M2'],'%Y-%m-%d')
    m2 = datetime.strptime(str(data['m2']),'%Y-%m-%d')
    m6p = datetime.strptime(str(data['m6p']),'%Y-%m-%d')
    latedate = datetime.strptime(str(data['latedate']),'%Y-%m-%d')
    TP_M2 = datetime.strptime(str(data['TP_M2']),'%Y-%m-%d')
    if  m2 == TP_M2 and m6p + pd.to_timedelta(tolerance,unit='D') <= latedate:
        return 'GREEN'


    elif m2!= TP_M2 and m6p + pd.to_timedelta(tolerance,unit='D') < latedate:
        return 'AMBER'


    elif m2!= None and m6p + pd.to_timedelta(tolerance,unit='D') > latedate:
        return 'RED'
df['RAG'] = df.apply(pmm_m2_rag, axis=1)

7 Comments

i am having this error TypeError: ('strptime() argument 1 must be str, not datetime.date', 'occurred at index 0') Comes from: m2 = datetime.strptime(data['m2'],'%Y-%m-%d')
So, the datatype of column is datetime. How you are creating the dataframe ? can you paste it in your question ?
m2 is an object i posted the datatypes of the column i am importing the dataframe from a database
If datatype is datetime, then no need to parse it again. You can use it as such. When I copy paste your sample data, datatype of every column is showing as object. One workaround is use str, to convert the column values to string before parsing. I edited my answer.
I keep getting this one now: ValueError: ("time data 'NaT' does not match format '%Y-%m-%d'", 'occurred at index 6')
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.