0

I'm trying to find a way to populate a new field in a dataframe that is the result of a group_by and aggregation. For example, in a measurements dataframe, a column reader, has a list of animal sights per days, the row structure looks like:

reader: Amazonas1A
sights:
- name: dartfrog
  day: 2
- name: piranha
  day: 3
- name dartfrog
  day: 4

I'd like to calculate, for each reader, what are the top 2 animals seen and their total. For example, the added column in the measurements dataframe would contain the calculated data:

reader: Amazonas1A
sights: [...] # Same as above
top_sights: # New calculated field
- name: dartfrog
  total: 2
- name piranha
  total: 1

I was going through explode(), unnest() but can't quite figure out how to use something rank/top/limit/having/etc.

2
  • What have you tried that you feel got closest to what you want? And what was wrong with that try? Commented Dec 17, 2024 at 12:48
  • What is the schema of your measurements dataframe? From the way you gave your examples, it's not clear if you have nested structs or what. Commented Dec 26, 2024 at 16:16

1 Answer 1

0

@cmdlineuser in polars discord answered with:

pl.col("sights").list.eval(pl.element().struct.field("name").value_counts(sort=True)).list.head(2)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.