0

I found similar questions but they did not solve my problem.

I have this Pandas DataFrame. The dtypes of the columns can either be str or dt, I can change this afterwards:

    id_of_station   measurement1    measurement2    measurement3    measurement4    measurement5
0   7               NaN             NaN             NaN             NaN             NaN
1   21              2021-04-09      2021-04-09      2021-04-09      2021-04-09      NaN
2   28              2021-04-09      2021-04-09      2021-04-09      2021-04-09      NaN
3   31              2021-04-09      2021-04-09      2021-04-09      2021-04-09      2021-04-09 
4   42              2021-04-09      NaN             NaN             2021-04-09      NaN
... ...             ...             ...             ...             ...             ...
489 9546            NaN             NaN             2021-04-09      2021-04-09      NaN

What I want is to merge the date columns together to one new column. If there is no date for the specific ID like in id_of_station 7 the Output should be NaN.

So the Output should look similar to this:

    id_of_station   last_measurement    
0   7               NaN             
1   21              2021-04-09  
2   28              2021-04-09 
3   31              2021-04-09
4   42              2021-04-09
... ...             ...        
489 9546            2021-04-09

2 Answers 2

2

make use of melt() method:

resultdf=df.melt(id_vars='id_of_station',value_name='last_measurement').drop(columns=['variable'])

OR

you can also do this by unstack() method

resultdf=df.set_index('id_of_station').unstack().droplevel(0).to_frame().rename(columns={0:'last_measurement'}).reset_index()

Now if you print resultdf you will get your desired output:

    id_of_station   last_measurement    
0   7               NaN             
1   21              2021-04-09  
2   28              2021-04-09 
3   31              2021-04-09
4   42              2021-04-09
... ...             ...        
489 9546            2021-04-09
Sign up to request clarification or add additional context in comments.

4 Comments

These methods are generating outputs with 2450 instead of 489 rows because of the 5 measurement columns. Can I group these together after melting, unstacking them? I cant just take the first 489 and drop the rest because the date is actually missing if its not in the first measurement column.
try: resultdf=resultdf.drop_duplicates() let me know if it works or not
Now there are 894 rows and some ids are missing.
Yes becuase that's were duplicates values
1

You can use apply() on rows.

def merge(row):
    elems = row.dropna().tolist()
    return elems[-1] if elems else np.nan

df_ = pd.concat([df.iloc[:, :1], df.iloc[:, 1:].apply(merge, axis=1).rename('last_measurement')], axis=1,)
# print(df_)

   id_of_station  last_measurement
0              7         NaN
1             21  2021-04-09
2             28  2021-04-09
3             31  2021-04-09
4             42  2021-04-09
5           9546  2021-04-09

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.