I have concatenated two dataframes, the column type before concatenation was datetime, but after concatenation the column type changed to object, and when I export to excel it completely changed!
here is the two dataframe:
df_last_month:
| project number | status | Project Naming | CF | VPC | CO | MA |
|---|---|---|---|---|---|---|
| A | Planned | DH | 2021-01-26 | 2021-03-16 | 2021-11-16 | 2023-10-10 |
| B | frozen | DH | 2017-12-01 | 2018-12-18 | 2019-07-26 | 2022-02-18 |
| C | Planned | DH | 2017-12-01 | 2018-12-18 | 2019-07-26 | 2022-02-18 |
| D | Planned | HH | 2017-12-01 | 2018-12-18 | 2019-07-26 | 2022-02-18 |
df_current_month:
| project number | status | Project Naming | CF | VPC | CO | MA |
|---|---|---|---|---|---|---|
| A | Planned | DH | 2021-01-10 | 2021-03-16 | 2021-09-16 | 2023-10-10 |
| B | frozen | DH | 2017-12-01 | 2018-12-18 | 2019-07-26 | 2022-02-18 |
| E | completed | DH | 2017-12-01 | 2018-12-18 | 2019-07-26 | 2022-02-18 |
| F | completed | HH | 2017-12-01 | 2018-12-18 | 2019-07-26 | 2022-02-18 |
| H | completed | HH | 2017-12-01 | 2018-12-18 | 2019-07-26 | 2022-02-18 |
I have concatenated df1 and df2 with some conditions , here is the code :
df_last_month = df_last_month.set_index('project number')
df_current_month = df_current_month.set_index('project number')
df3 = pd.concat([df_last_month,df_current_month],sort=False)
df3a = df3.stack().groupby(level=[0,1]).unique().unstack(1).copy()
df3a.loc[~df3a.index.isin(df_last_month.index),'update_project'] = 'new'
df3a.loc[~df3a.index.isin(df_current_month.index),'update_project'] ='deleted'
idx = df3.stack().groupby(level=[0,1]).nunique()
df3a.loc[idx.mask(idx<=1).dropna().index.get_level_values(0),
'update_project'='modified'
df3a['update_project'] = df3a['update_project'].fillna('same')
here is the input:
what I'm trying to do is : in column(CF , CO , MA , VPC) I have two format:
- the first :[2021-01-26 00:00:00]
- the second: [2021-01-26 00:00:00,2021-01-10 00:00:00]
I want to remove the time.
and then when I export to excel , I will also have the same format, I mean [2021-01-26] or [2021-01-26,2021-01-10], but now I have this as a result in excel:
here is my code :
import pandas as pd
import numpy as np
from datetime import datetime, date
# Classify date column by format type
df['format'] = 1
df.loc[df['CF'].astype(str).str.contains(','), 'format'] = 2
df['new_date'] = pd.to_datetime(df['CF'])
# Convert to datetime with two different format settings
df.loc[df.format == 1, 'new_date'] = pd.to_datetime(df.loc[df.format == 1, 'CF'], format = '%Y-%d-%m %H:%M:%S').dt.strftime('%Y-%m-%d')
df.loc[df.format == 2, 'new_date'] = pd.to_datetime(df.loc[df.format == 2, 'CF'], format = '%m/%d/%Y %H:%M:%S,%m/%d/%Y %H:%M:%S').dt.strftime('%Y-%m-%d,%m/%d/%Y')
print(df)
Any suggestions ? thanks for your help


