0

I have a dynamic list :

[{'dashboard': 'AG', 'end_date': '2021-06-17 13:13:43', 'location': 'EC & pH Reading', 'zone_name': 'Zone 1 Left'}, 

{'dashboard': 'AG', 'end_date': '2021-06-17 12:40:06', 'location': 'Harvest', 'zone_name': 'Zone 2 Left'}, 

{'dashboard': 'AG', 'end_date': '2021-06-16 15:52:52', 'location': 'Harvest', 'zone_name': 'Zone 1 Left' }, 

{'dashboard': 'AG', 'end_date': '2021-06-16 15:45:51', 'location': 'Harvest', 'zone_name': 'Zone 1 Left'}]

I want to remove the duplicates based on zone_name and location. There are 3 values in zone_name. I want to remove the old one. I have sorted using the end_date. That is latest will come at top. Now i need to remove the duplicate value based on zone_name and location.

This is what i have tried:

final_zone = []
res_list = []
for i in sortedArray:
     if i["location"] not in final_zone:
          sch.append(i)
          final_zone.append(i["location"])

What change i need to do to remove the duplicate based on zone_name and location.

That is in zone 1 left , there are 3 values, i need the latest one

1
  • Latest one. I have sorted that by end_date Commented Jun 17, 2021 at 8:44

5 Answers 5

1

For a general approach with an unsorted list:

from itertools import groupby
from operator import itemgetter

# sorting and grouping functions
f_sort = itemgetter("location", "zone_name", "end_date")  # sort by descending
f_group = itemgetter("location", "zone_name")  # group sorted by

result = [
    next(g) for _, g in  # only take latest of each group
    groupby(sorted(array, key=f_sort, reverse=True), key=f_group)
]

Here is some documentation on the used utils (all of which are really handy in a lot of use cases):

Sign up to request clarification or add additional context in comments.

Comments

0
clean_list=[]

for elem in lst:
    # control if an element with the same zone name and location
    # is yet present in the clean list
    yet_present= len([el for el in clean_list
                if el['zone_name']==elem['zone_name']
                if el['location']==elem['location']])>0
    if not yet_present:
        clean_list.append(elem)

OUTPUT:

[{'dashboard': 'AG',
  'end_date': '2021-06-17 13:13:43',
  'location': 'EC & pH Reading',
  'zone_name': 'Zone 1 Left'},
 {'dashboard': 'AG',
  'end_date': '2021-06-17 12:40:06',
  'location': 'Harvest',
  'zone_name': 'Zone 2 Left'},
 {'dashboard': 'AG',
  'end_date': '2021-06-16 15:52:52',
  'location': 'Harvest',
  'zone_name': 'Zone 1 Left'}]

4 Comments

You have saved my day. Thanks
If my answer is useful, please upvote and/or accept it.
Need 15 reputation to upvote. i dont have that, so i cant able to do
You can accept it (instructions: meta.stackexchange.com/a/5235/645001)
0

Create a variable result, and for each dictionary item in the data list, check if its already there in the result, if yes don't append, else append it to the result list.

result = []
for item in data:
    if item['zone_name'] in (x['zone_name'] for x in result):
        continue
    result.append(item)

OUTPUT:

[{'dashboard': 'AG',
  'end_date': '2021-06-17 13:13:43',
  'location': 'EC & pH Reading',
  'zone_name': 'Zone 1 Left'},
 {'dashboard': 'AG',
  'end_date': '2021-06-17 12:40:06',
  'location': 'Harvest',
  'zone_name': 'Zone 2 Left'}]

Comments

0

You can just loop through the list and memorize the indices you want to keep.

keepers = {}
for i in range(len(sorted_array)):
    keepers(sorted_array[i]['location'])=i ## Will be overwritten if the zone_name repeats

final_array = []
for i in keepers.values():
    final_array.append(sorted_array[i])

As a bonus, you get a list of all zones in keepers.keys().

But your approach might actually also work. Just change sch.append(i) to res_list.append(i) and change the order of the iterable (for i in sorted_array[::-1]), so the last and not the first one gets kept.

Comments

0

The other answers work but I want to add a solution using Pandas

you can create a dataframe from your list of dictionaries:

import pandas as pd
d = [{'dashboard': 'AG', 'end_date': '2021-06-17 13:13:43', 'location': 'EC & pH Reading', 'zone_name': 'Zone 1 Left'}, {'dashboard': 'AG', 'end_date': '2021-06-17 12:40:06', 'location': 'Harvest', 'zone_name': 'Zone 2 Left'}, 

{'dashboard': 'AG', 'end_date': '2021-06-16 15:52:52', 'location': 'Harvest', 'zone_name': 'Zone 1 Left' }, 

{'dashboard': 'AG', 'end_date': '2021-06-16 15:45:51', 'location': 'Harvest', 'zone_name': 'Zone 1 Left'}]
df = pd.DataFrame(d)

This is what df looks like:

dashboard             end_date         location    zone_name
0        AG  2021-06-17 13:13:43  EC & pH Reading  Zone 1 Left
1        AG  2021-06-17 12:40:06          Harvest  Zone 2 Left
2        AG  2021-06-16 15:52:52          Harvest  Zone 1 Left
3        AG  2021-06-16 15:45:51          Harvest  Zone 1 Left

Sort of like a table in excel.

Now with one line, you can do exactly what you want:

df.sort_by("end_date").drop_duplicates(["location", "zone_name"], keep="last")

output:

  dashboard             end_date         location    zone_name
2        AG  2021-06-16 15:52:52          Harvest  Zone 1 Left
1        AG  2021-06-17 12:40:06          Harvest  Zone 2 Left
0        AG  2021-06-17 13:13:43  EC & pH Reading  Zone 1 Left

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.