I'm using pandas to load data from Excel with the resulting DataFrame containing both strings and dates. The columns containing strings are of dtype "object" while the date-columns are of dtype "datetime64[ns]". At some point in my code I need to convert one column from datetime to string for writing back to Excel, but pandas will not let me do that if I try it the way that would be most obvious to me and seems to be the recommended way of doing it according to documentation: using .loc to get the column to be changed and assign them with the same column converted to strings.
I have found ways to circumvent the problem and get pandas to do what I need, but either this is a bug or I do not understand some underlying mechanic which could come back to bite me in the longer run, hence my question.
The code to reproduce this (occurs in both pandas 2.0.0 and 2.0.1, this might cause the problem) is as follows (in the actual DataFrame I am using there are many more than a single column):
import pandas as pd
not_yet_datetime_df = pd.DataFrame([["2023-01-06", "2023-01-06", "2023-01-06", "2023-01-06", "2023-01-06"]]).T
datetime_df = not_yet_datetime_df.astype("datetime64[ns]")
datetime_df.loc[:, 0] = datetime_df.loc[:, 0].dt.strftime("%d.%m.%Y")
datetime_df.loc[:, 0] = datetime_df.loc[:, 0].astype("object") # neither of these two will work for me
print(datetime_df.dtypes) # will return "datetime64[ns]" for this single column
There are multiple ways to circumvent this that work for me, including simply replacing line 5 with
datetime_df[0] = datetime_df.loc[:, 0].dt.strftime("%d.%m.%Y") (omitting the .loc left of the equals sign) and I at least can get the column to being "object" dtype with datetime_df = datetime_df.astype({0:"object"}), but I don't quite understand why especially the first solution works and what I misunderstood about .loc - or about datetimes in general.
I read a bit into the pandas 2.0.0 change on returning views vs. copies but to my (limited) understanding this should not be affected by any of the 2.0.0 changes.
Could anyone help me understand what's happening here under the hood? I like using .loc over assigning just with []-brackets and I feel like it's not as intuitive as I had hoped.
lochere in the first place - If you want to convert a column to datetime, why not simply usedf["dt"] = pd.to_datetime(df["date-as-string"])? If you want to convert to string, why not usedf["back-to-string"] = df["dt"].dt.strftime("%Y-%m-%d")?locif you want to modify part of a Series/column for instance, or a part of a df that is not an individual column.loc,at,ilocetc) are still to be preferred wherever possible; this stems from the introductory note to their indexing section I linked to above and is mentioned at a few places in the rest of the guide. In any case, the two methods should be equal, which they are not. My questions was mostly about the why and if that is even intended behaviour. Other working methods existing does not really answer that.