Polars: return dataframe with all unique values of N columns

Question

I have a dataframe that has many rows per combination of the 'PROGRAM', 'VERSION', and 'RELEASE_DATE' columns. I want to get a dataframe with all of the combinations of just those three columns.

Would this be a job for groupby or distinct?

jqurious · Accepted Answer · 2024-10-05 20:44:55Z

5

Since you are not aggregating anything, use unique

df.select('PROGRAM','VERSION','RELEASE_DATE').unique()

edited Oct 5, 2024 at 20:44

jqurious

24.2k6 gold badges24 silver badges43 bronze badges

answered Mar 7, 2022 at 19:47

user18263465

Sign up to request clarification or add additional context in comments.

3 Comments

rchitect-of-info Over a year ago

Can the select version posted above be iterated on?

user18263465 Over a year ago

for prog, vers, rel in df.select(['PROGRAM','VERSION','RELEASE_DATE']).distinct().rows(): ...

magomar Over a year ago

distinct() has been deprecated, you should use unique() instead

Hericks · Accepted Answer · 2024-05-05 11:37:54Z

5

pl.DataFrame.unique has a subset parameter to specify the columns to consider when identifying duplicate rows.

df.unique(subset=["PROGRAM", "VERSION", "RELEASE_DATE"])

answered May 5, 2024 at 11:37

Hericks

12.9k3 gold badges35 silver badges44 bronze badges

Collectives™ on Stack Overflow

Polars: return dataframe with all unique values of N columns

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related