I'm trying to find a way to populate a new field in a dataframe that is the result of a group_by and aggregation.
For example, in a measurements dataframe, a column reader, has a list of animal sights per days, the row structure looks like:
reader: Amazonas1A
sights:
- name: dartfrog
day: 2
- name: piranha
day: 3
- name dartfrog
day: 4
I'd like to calculate, for each reader, what are the top 2 animals seen and their total.
For example, the added column in the measurements dataframe would contain the calculated data:
reader: Amazonas1A
sights: [...] # Same as above
top_sights: # New calculated field
- name: dartfrog
total: 2
- name piranha
total: 1
I was going through explode(), unnest() but can't quite figure out how to use something rank/top/limit/having/etc.