Rust Polars dataframe aggregate top values from list in dataframe and join back to original dataframe

Question

I'm trying to find a way to populate a new field in a dataframe that is the result of a group_by and aggregation. For example, in a measurements dataframe, a column reader, has a list of animal sights per days, the row structure looks like:

reader: Amazonas1A
sights:
- name: dartfrog
  day: 2
- name: piranha
  day: 3
- name dartfrog
  day: 4

I'd like to calculate, for each reader, what are the top 2 animals seen and their total. For example, the added column in the measurements dataframe would contain the calculated data:

reader: Amazonas1A
sights: [...] # Same as above
top_sights: # New calculated field
- name: dartfrog
  total: 2
- name piranha
  total: 1

I was going through explode(), unnest() but can't quite figure out how to use something rank/top/limit/having/etc.

What have you tried that you feel got closest to what you want? And what was wrong with that try? — Jmb
– Jmb, Commented Dec 17, 2024 at 12:48
What is the schema of your measurements dataframe? From the way you gave your examples, it's not clear if you have nested structs or what. — Dean MacGregor
– Dean MacGregor, Commented Dec 26, 2024 at 16:16

sebosp · Accepted Answer · 2024-12-27 19:42:17Z

0

@cmdlineuser in polars discord answered with:

pl.col("sights").list.eval(pl.element().struct.field("name").value_counts(sort=True)).list.head(2)

answered Dec 27, 2024 at 19:42

sebosp

13 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Rust Polars dataframe aggregate top values from list in dataframe and join back to original dataframe

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related