0

I have a df look like below, but much bigger. There are some incorrect dates under the column of lastDate, and they are only incorrect if there is something in correctDate column, right next to them.

dff = pd.DataFrame(
            {"lastDate":['2016-3-27', '2016-4-11', '2016-3-27', '2016-3-27', '2016-5-25', '2016-5-31'],
             "fixedDate":['2016-1-3', '', '2016-1-18', '2016-4-5', '2016-2-27', ''],
             "analyst":['John Doe', 'Brad', 'John', 'Frank', 'Claud', 'John Doe']
            })

enter image description here

enter image description here First one is what I have and the second one is what I'd like to have after the loop

0

1 Answer 1

0

First convert these columns to datetime dtypes:

for col in ['fixedDate', 'lastDate']:
    df[col] = pd.to_datetime(df[col])

Then you could use

mask = pd.notnull(df['fixedDate'])
df.loc[mask, 'lastDate'] = df['fixedDate']

For example,

import pandas as pd

df = pd.DataFrame( {"lastDate":['2016-3-27', '2016-4-11', '2016-3-27', '2016-3-27', '2016-5-25', '2016-5-31'], "fixedDate":['2016-1-3', '', '2016-1-18', '2016-4-5', '2016-2-27', ''], "analyst":['John Doe', 'Brad', 'John', 'Frank', 'Claud', 'John Doe'] })

for col in ['fixedDate', 'lastDate']:
    df[col] = pd.to_datetime(df[col])

mask = pd.notnull(df['fixedDate'])
df.loc[mask, 'lastDate'] = df['fixedDate']
print(df)

yields

    analyst  fixedDate   lastDate
0  John Doe 2016-01-03 2016-01-03
1      Brad        NaT 2016-04-11
2      John 2016-01-18 2016-01-18
3     Frank 2016-04-05 2016-04-05
4     Claud 2016-02-27 2016-02-27
5  John Doe        NaT 2016-05-31
Sign up to request clarification or add additional context in comments.

2 Comments

When I applied the mask to my real dataset, it made all the lastDate with null fixedDate as null as well. It didn't happened to the sample df. Any clue why this is happening?
I had written that converting the date-strings to actual dates was not strictly necessary but now I realize that isn't true. pd.notnull(['']) equals np.array([ True]) so the mask would be True where the fixedDate is an empty string. That would cause df.loc[mask, 'lastDate'] = df['fixedDate'] to overwrite the lastDate even when the fixedDate is an empty string. That might explain the behavior you are seeing, assuming you did not convert the date-strings to datetime64s using pd.to_datetime.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.