0

I want to change the value in 1 column in the data frame based on the conditions and comparison of values in other columns.

This is the original data frame:

        start         end diff
0  2016-05-08     unknown  3
1  2016-05-08  2017-09-08  5
2  2018-09-01  2017-09-01  5

This is the data frame that I want:

        start         end diff
0  2016-05-08     unknown  3
1  2016-05-08  2017-09-08  1
2  2018-09-01  2017-09-01  -1

Basically, I want the values in diff column to remain the same if end is unknown, otherwise, I want it to be the value of year value of end - year value of start.

Can anyone suggest a piece of code?

Thanks in advance!

1 Answer 1

1

Here is one way using np.where , after convert the datatime by using to_datetime. Also , please do not name a columns with build-in function name like : diff, sum , min, max and cumsum.

df.start=pd.to_datetime(df.start)
df.end=pd.to_datetime(df.end,errors = 'coerce')
df['diff']=np.where(df.end.isnull(),df['diff'],df.end.dt.year-df.start.dt.year)
df
Out[135]: 
       start        end  diff
0 2016-05-08        NaT   3.0
1 2016-05-08 2017-09-08   1.0
2 2018-09-01 2017-09-01  -1.0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.