How can I merge multiple date columns in a Pandas DataFrame into one column?

Question

I found similar questions but they did not solve my problem.

I have this Pandas DataFrame. The dtypes of the columns can either be str or dt, I can change this afterwards:

    id_of_station   measurement1    measurement2    measurement3    measurement4    measurement5
0   7               NaN             NaN             NaN             NaN             NaN
1   21              2021-04-09      2021-04-09      2021-04-09      2021-04-09      NaN
2   28              2021-04-09      2021-04-09      2021-04-09      2021-04-09      NaN
3   31              2021-04-09      2021-04-09      2021-04-09      2021-04-09      2021-04-09 
4   42              2021-04-09      NaN             NaN             2021-04-09      NaN
... ...             ...             ...             ...             ...             ...
489 9546            NaN             NaN             2021-04-09      2021-04-09      NaN

What I want is to merge the date columns together to one new column. If there is no date for the specific ID like in id_of_station 7 the Output should be NaN.

So the Output should look similar to this:

    id_of_station   last_measurement    
0   7               NaN             
1   21              2021-04-09  
2   28              2021-04-09 
3   31              2021-04-09
4   42              2021-04-09
... ...             ...        
489 9546            2021-04-09

Anurag Dabas · Accepted Answer · 2021-04-10 10:31:20Z

2

make use of melt() method:

resultdf=df.melt(id_vars='id_of_station',value_name='last_measurement').drop(columns=['variable'])

OR

you can also do this by unstack() method

resultdf=df.set_index('id_of_station').unstack().droplevel(0).to_frame().rename(columns={0:'last_measurement'}).reset_index()

Now if you print resultdf you will get your desired output:

    id_of_station   last_measurement    
0   7               NaN             
1   21              2021-04-09  
2   28              2021-04-09 
3   31              2021-04-09
4   42              2021-04-09
... ...             ...        
489 9546            2021-04-09

edited Apr 10, 2021 at 10:31

answered Apr 10, 2021 at 9:43

Anurag Dabas

24.3k9 gold badges25 silver badges41 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

bennimueller Over a year ago

These methods are generating outputs with 2450 instead of 489 rows because of the 5 measurement columns. Can I group these together after melting, unstacking them? I cant just take the first 489 and drop the rest because the date is actually missing if its not in the first measurement column.

Anurag Dabas Over a year ago

try: resultdf=resultdf.drop_duplicates() let me know if it works or not

bennimueller Over a year ago

Now there are 894 rows and some ids are missing.

Anurag Dabas Over a year ago

Yes becuase that's were duplicates values

Ynjxsjmh · Accepted Answer · 2021-04-10 10:07:47Z

1

You can use apply() on rows.

def merge(row):
    elems = row.dropna().tolist()
    return elems[-1] if elems else np.nan

df_ = pd.concat([df.iloc[:, :1], df.iloc[:, 1:].apply(merge, axis=1).rename('last_measurement')], axis=1,)

# print(df_)

   id_of_station  last_measurement
0              7         NaN
1             21  2021-04-09
2             28  2021-04-09
3             31  2021-04-09
4             42  2021-04-09
5           9546  2021-04-09

answered Apr 10, 2021 at 10:07

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Collectives™ on Stack Overflow

How can I merge multiple date columns in a Pandas DataFrame into one column?

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related