Processing a number of 'leads', which are all in the form of class objects and POSTing them via API to third party platform. Script works, but is slow and inefficient. Looking for ideas on how to speed it up.
ADMINS = admins.get_admins()
lead_list_ids = get_lead_list_ids(TAG) # returns dict of admin.slug / list id pairs
processed = []
for admin in tqdm(ADMINS):
lead_list_id = lead_list_ids[admin.slug]
for lead in tqdm(hunter_results):
if lead.account.owner.email.split('@')[0] == admin.slug: # splitting email to get user "initials" which is same as admin.slug
processed.append(lead)
create_lead(lead, lead_list_id)
# creates a slice modifying exisiting array...might be taking more time than it saves..
hunter_results[:] = [lead for lead in hunter_results if lead not in processed]
print(f'\nSuccess! {len(hunter_results)} leads created.')
This currently runs very slow...I originally wrote it without the 'processed' array, which caused the script to iterate over the 'hunter_results' array (3000+ items) again and again for every single Admin (user). This seemed inefficient so I decided to remove the processed leads by appending them to 'processed' list, and then filtering the original array down. To my (somewhat) surprise, this takes even longer as the slice/list comprehension is hella slow at filtering down the list.
I assume this is because the list comprehension essentially created another loop that needs to run, but I am struggling to come up with a more efficient way to do this. I do not want to remove the values from the original array during iteration for obvious reasons, but it seems doing this as a separate process is even worse. Any ideas?