2

I'm using pandas to load data from Excel with the resulting DataFrame containing both strings and dates. The columns containing strings are of dtype "object" while the date-columns are of dtype "datetime64[ns]". At some point in my code I need to convert one column from datetime to string for writing back to Excel, but pandas will not let me do that if I try it the way that would be most obvious to me and seems to be the recommended way of doing it according to documentation: using .loc to get the column to be changed and assign them with the same column converted to strings.

I have found ways to circumvent the problem and get pandas to do what I need, but either this is a bug or I do not understand some underlying mechanic which could come back to bite me in the longer run, hence my question.

The code to reproduce this (occurs in both pandas 2.0.0 and 2.0.1, this might cause the problem) is as follows (in the actual DataFrame I am using there are many more than a single column):

import pandas as pd

not_yet_datetime_df = pd.DataFrame([["2023-01-06", "2023-01-06", "2023-01-06", "2023-01-06", "2023-01-06"]]).T
datetime_df = not_yet_datetime_df.astype("datetime64[ns]")
datetime_df.loc[:, 0] = datetime_df.loc[:, 0].dt.strftime("%d.%m.%Y")
datetime_df.loc[:, 0] = datetime_df.loc[:, 0].astype("object")  # neither of these two will work for me
print(datetime_df.dtypes)  # will return "datetime64[ns]" for this single column

There are multiple ways to circumvent this that work for me, including simply replacing line 5 with datetime_df[0] = datetime_df.loc[:, 0].dt.strftime("%d.%m.%Y") (omitting the .loc left of the equals sign) and I at least can get the column to being "object" dtype with datetime_df = datetime_df.astype({0:"object"}), but I don't quite understand why especially the first solution works and what I misunderstood about .loc - or about datetimes in general.

I read a bit into the pandas 2.0.0 change on returning views vs. copies but to my (limited) understanding this should not be affected by any of the 2.0.0 changes.

Could anyone help me understand what's happening here under the hood? I like using .loc over assigning just with []-brackets and I feel like it's not as intuitive as I had hoped.

4
  • I don't see why you need to use loc here in the first place - If you want to convert a column to datetime, why not simply use df["dt"] = pd.to_datetime(df["date-as-string"])? If you want to convert to string, why not use df["back-to-string"] = df["dt"].dt.strftime("%Y-%m-%d")? Commented Apr 27, 2023 at 18:12
  • @FObersteiner that's exactly what I am doing as a workaround for the issue. According to pandas documentation, the recommended way of doing it is using .loc though. Also there is no reason for brackets alone to work but .loc behaving differently, at least none that I would understand, which is the point of my question. Commented Apr 28, 2023 at 6:52
  • I don't think this is a work-around; to the contrary it's the pandas-normal way to create or change whole Series / df columns. You use loc if you want to modify part of a Series/column for instance, or a part of a df that is not an individual column. Commented Apr 28, 2023 at 8:04
  • As I understand the documentation of pandas, using brackets is the most intuitive way when coming from python in general and numpy specifically while the specialized methods (loc, at, iloc etc) are still to be preferred wherever possible; this stems from the introductory note to their indexing section I linked to above and is mentioned at a few places in the rest of the guide. In any case, the two methods should be equal, which they are not. My questions was mostly about the why and if that is even intended behaviour. Other working methods existing does not really answer that. Commented Apr 28, 2023 at 9:16

1 Answer 1

1

See also Does .loc[:, ['A','B']] assignment allow to change the dtype of the columns? - .loc tries to cast back to the original type. There's a note in the docs on this, that might be hard to find:

When trying to convert a subset of columns to a specified type using astype() and loc(), upcasting occurs.

Anyways, that's why it keeps being datetime64[ns]. For demonstration, if you change the datetime format to something that pandas' parser doesn't accept, "upcasting" fails and the dtype is kept:

import pandas as pd

not_yet_datetime_df = pd.DataFrame([["2023-01-06", "2023-01-06", "2023-01-06", "2023-01-06", "2023-01-06"]]).T
datetime_df = not_yet_datetime_df.astype("datetime64[ns]")
datetime_df.loc[:, 0] = datetime_df.loc[:, 0].dt.strftime("%d. asdf %m. xx %Y")
print(datetime_df.dtypes) 
# 0    object
# dtype: object

In general, I would still argue that in this case, the most basic [ ] gets you safely where you want to. If you create a new column, you're neither slicing nor selecting nor indexing. Same goes for if you replace a column (e.g. string with datetime dtype Series; ok you're selecting something here...). So I don't see the point of using loc here in the first place.

Sign up to request clarification or add additional context in comments.

2 Comments

I generally never use [ ] unless assigning new columns because it sometimes behaves differently than .loc and i don't want to worry about that potentially subtle behaviour; also .loc gives more control over what I am doing in general and allows slicing on both axes while the brackets alone do not. That is why I had the impression I could just always use .loc and be done with it; turns out I can not. Thanks alot for the answer.
@Archernar Actually, I was surprised by this behavior of loc. I would have expected it to work as you did. Also, don't get me wrong, loc is great for selecting and modifying specific parts of dataframes / series. It's just that I find it kind of bulky (and hard to read) when it comes to adding or replacing whole columns. I used polars lately, and I think the methods there are much more concise. So maybe I'm just biased ;-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.