1

When changing the values and/or dtypes of specific columns there is a different behaviour from Pandas 1.x to 2.x.

For example, on column e in the example below:

  • Pandas 1.x: Using pd.to_datetime to update the column will parse the date and change its dtype
  • Pandas 2.x: Using pd.to_datetime to update the column will parse the date but will not change its dtype

What change from Pandas 1.x to 2.x explains this behavior?

Example code

import pandas as pd

# Creates example DataFrame
df = pd.DataFrame({
    'a': ['1', '2'],
    'b': ['1.0', '2.0'],
    'c': ['True', 'False'],
    'd': ['2024-03-07', '2024-03-06'],
    'e': ['07/03/2024', '06/03/2024'],
    'f': ['aa', 'bb'],
})

# Changes dtypes of existing columns
df.loc[:, 'a'] = df.a.astype('int')
df.loc[:, 'b'] = df.b.astype('float')
df.loc[:, 'c'] = df.c.astype('bool')

# Parses and changes dates dtypes
df.loc[:, 'd'] = pd.to_datetime(df.d)
df.loc[:, 'e'] = pd.to_datetime(df.e, format='%d/%m/%Y')

# Changes values of existing columns
df.loc[:, 'f'] = df.f + 'cc'

# Creates new column
df.loc[:, 'g'] = [1, 2]

Results in Pandas 1.5.2

In [2]: df
Out[2]: 
   a    b     c          d          e     f  g
0  1  1.0  True 2024-03-07 2024-03-07  aacc  1
1  2  2.0  True 2024-03-06 2024-03-06  bbcc  2

In [3]: df.dtypes
Out[3]: 
a             int64
b           float64
c              bool
d    datetime64[ns]
e    datetime64[ns]
f            object
g             int64
dtype: object

Results in Pandas 2.1.4

In [2]: df
Out[2]: 
   a    b     c                    d                    e     f  g
0  1  1.0  True  2024-03-07 00:00:00  2024-03-07 00:00:00  aacc  1
1  2  2.0  True  2024-03-06 00:00:00  2024-03-06 00:00:00  bbcc  2

In [3]: df.dtypes
Out[3]: 
a    object
b    object
c    object
d    object
e    object
f    object
g     int64
dtype: object

1 Answer 1

3

From What’s new in 2.0.0 (April 3, 2023):

Changed behavior in setting values with df.loc[:, foo] = bar or df.iloc[:, foo] = bar, these now always attempt to set values inplace before falling back to casting (GH 45333).

So in Pandas 2+, whenever you set values with .loc, it will try to set them in place. If it succeeds, it will not create a new column, and will preserve the existing column's dtype.

Compare this with df[foo] = bar: this will create a new column with the dtype inferred from the values that are being set. The same happens when you do df['d'] = pd.to_datetime(df.d), i.e., even in Pandas 2+, it will create a new column with dtype of datetime64[ns].

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.