-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Open
Description
Grouping by a Series and the mutating that Series can have different impacts whether a view on the data exists.
ser = pd.Series([1, 2, 1])
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
gb = df.groupby(ser)
ser.iloc[0] = 100
print(gb.sum())
# a b
# 1 3 6
# 2 2 5
# 100 1 4
ser = pd.Series([1, 2, 1])
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
ser2 = ser[:]
gb = df.groupby(ser)
ser.iloc[0] = 100
print(gb.sum())
# a b
# 1 4 10
# 2 2 5This only happens for certain paths in groupby, e.g. using
ser = pd.Series(pd.Categorical([1, 2, 1], categories=[1, 2, 100]))
gives the latter behavior. We should be taking a shallow copy of any grouping Series when we create the DataFrameGroupBy instance.
Hat-tip to @jorisvandenbossche for constructing the example.