Why does groupby().apply() produce inconsistent results on identical groups when the DataFrame has overlapping indices?

Question

I noticed that groupby().apply() produces different results for two groups that look identical, except that the overall DataFrame has duplicate index values.

Here is a minimal reproducible example:

import pandas as pd

df = pd.DataFrame({
    'group': ['A','A','B','B','B'],
    'value': [1,2,1,2,2]
}, index=[0,1,1,2,3])  # note the duplicate index: 1 appears twice

result = df.groupby('group').apply(lambda g: g)
print(result)

Output:

    group  value
group             
A     A     1
      A     2
B     B     1
      B     2
      B     2

But when I reset the index so it becomes unique:

df2 = df.reset_index(drop=True)
result2 = df2.groupby('group').apply(lambda g: g)
print(result2)

I get a different structure (especially inside the B group).

Why does the presence of duplicate index values change how groupby().apply() constructs the returned index?What is the correct way to preserve the original rows and avoid unexpected index nesting when applying functions?

Looks like the correct behavior to me. The groupby column becomes the level 0 index and the original index becomes the level 1 index. — usdn
– usdn, Commented Nov 26 at 14:55
What do you mean by "different"? Please show the output you're getting. Some related behaviour varies wildly between Pandas 1.x and 2.3, so please clarify exactly what you mean. It would also help to show the output you expected, assuming it doesn't become obvious. — wjandrea
– wjandrea, Commented 5 hours ago
BTW, welcome to Stack Overflow! Check out the tour, and see How to Ask if you want tips. You can edit your question to add details and clarify :) — wjandrea
– wjandrea, Commented 5 hours ago
Also, beware the XY problem. .apply(lambda g: g) doesn't do anything interesting, obviously, so were you trying to do something more useful when you noticed this behaviour? — wjandrea
– wjandrea, Commented 5 hours ago
Relatedly, I get DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. Are you getting that too? If so, you should explicitly acknowledge it, like add include_groups=False as it says. — wjandrea
– wjandrea, Commented 5 hours ago

Ammar sadek · Accepted Answer · 2025-11-29 13:17:46Z

0

groupby().apply() always returns a MultiIndex where the inner level is the original index.
If your DataFrame has duplicate index values, the output becomes irregular because pandas must preserve those duplicates.
If the index is unique, the result looks clean.

How to avoid this:

Use one of these:

df.groupby('group', group_keys=False).apply(lambda g: g)

or reset the index first:

df.reset_index(drop=True).groupby('group').apply(lambda g: g)

or reset inside the apply:

df.groupby('group').apply(lambda g: g.reset_index(drop=True))

answered 9 hours ago

Ammar sadek

162 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Why does groupby().apply() produce inconsistent results on identical groups when the DataFrame has overlapping indices?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related