3

Here I have a dataframe like the following:

Variable    Groups
1           [0-10]
1           [0-10]
2           [0-10]
2           [0-10]
3           [0-10]
3           [10-20]
4           [10-20]
4           [10-20]
5           [10-20]
5           [10-20]

I'd like to get only unique values for Variable column, but don't want to lose any duplicates that are in different Groups, so for example this:

Variable    Groups
1           [0-10]
2           [0-10]
3           [0-10]
3           [10-20]
4           [10-20]
5           [10-20]

Note there is still a duplicate 3 because there was one in each group. I've tried

df_unique = df['Groups'].groupby(df['Variable']).unique().apply(pd.Series)

but this is just returning a complete mess. Not sure what to do, help appreciated.

3 Answers 3

4

You can use SeriesGroupBy.unique() together with .explode() and .reset_index(), as follows:

df.groupby('Variable')['Groups'].unique().explode().reset_index()

Another solution is to use GroupBy.first(), as follows:

df.groupby(['Variable', 'Groups'], as_index=False).first()

Result:

   Variable   Groups
0         1   [0-10]
1         2   [0-10]
2         3   [0-10]
3         3  [10-20]
4         4  [10-20]
5         5  [10-20]

Sign up to request clarification or add additional context in comments.

Comments

1

Here is another option:

df.groupby(['variable',df['groups'].explode()]).head(1)

2 Comments

You don't need to use .explode() here. No effect.
i believe you cannot groupby lists, so explode makes them object dtypes.
0

You need to write an expression that combines the two columns, and apply unique to the combination.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.