More efficient way to slice list / remove processed data in Python for loops

Question

Processing a number of 'leads', which are all in the form of class objects and POSTing them via API to third party platform. Script works, but is slow and inefficient. Looking for ideas on how to speed it up.

ADMINS = admins.get_admins()
lead_list_ids = get_lead_list_ids(TAG) # returns dict of admin.slug / list id pairs
processed = []
for admin in tqdm(ADMINS):
    lead_list_id = lead_list_ids[admin.slug]
    for lead in tqdm(hunter_results):
        if lead.account.owner.email.split('@')[0] == admin.slug: # splitting email to get user "initials" which is same as admin.slug
            processed.append(lead)
            create_lead(lead, lead_list_id)
    # creates a slice modifying exisiting array...might be taking more time than it saves..
    hunter_results[:] = [lead for lead in hunter_results if lead not in processed]
print(f'\nSuccess! {len(hunter_results)} leads created.')

This currently runs very slow...I originally wrote it without the 'processed' array, which caused the script to iterate over the 'hunter_results' array (3000+ items) again and again for every single Admin (user). This seemed inefficient so I decided to remove the processed leads by appending them to 'processed' list, and then filtering the original array down. To my (somewhat) surprise, this takes even longer as the slice/list comprehension is hella slow at filtering down the list.

I assume this is because the list comprehension essentially created another loop that needs to run, but I am struggling to come up with a more efficient way to do this. I do not want to remove the values from the original array during iteration for obvious reasons, but it seems doing this as a separate process is even worse. Any ideas?

How slow is slow? Are you sure the reshaping is the bottleneck and not making calls to the API? — JonSG
– JonSG, Commented Feb 10, 2023 at 15:25
Yes, re: the API. I actually refactored it so it only makes API calls in get_lead_list_ids() and create_lead() -- create_lead makes a POST request each time, but I have a progress bar to monitor and it flies through those. Before I added the list comprehension slice, it would jump right to the next Admin and iterate through the leads again right away...with the list comprehension, it hangs there for a good few minutes before starting to process the next Admin, so it seems pretty clear it's slowing down on the list slicing...but could def be wrong. Can post a vid/GIF of it in action if it helps — Magic-Wike
– Magic-Wike, Commented Feb 10, 2023 at 15:31
I suggest you read xyproblem.info and tell us what your X is, i.e., what you're really trying to do. — Kelly Bundy
– Kelly Bundy, Commented Feb 10, 2023 at 15:32

JonSG · Accepted Answer · 2023-02-10 15:38:56Z

1

I think a strategy that iterates over leads is what you want. Does this do what you seek to do? I don't think there would be a need to mutate hunter_results then as we just work through them one at a time looking for admins.

admin_slugs = set(admin.slug for admin in admins.get_admins())
lead_list_ids = get_lead_list_ids(TAG)

for lead in hunter_results:
    admin_slug == lead.account.owner.email.split('@')[0]

    if admin_slug not in admin_slugs:
        continue

    create_lead(lead, lead_list_ids[admin_slug])

answered Feb 10, 2023 at 15:38

JonSG

13.6k2 gold badges32 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Magic-Wike Over a year ago

Winner! This is the direction I knew I needed to go, but didn't think to declare the lead_list_id instance in the function call, allowing me to skip the nested loop altogether + I don't need to create any new arrays or objects. Thanks for the assist!

JHD-2164 · Accepted Answer · 2023-02-10 15:48:04Z

0

You could try something like this:

ADMINS = admins.get_admins()
lead_list_ids = get_lead_list_ids(TAG) # returns dict of admin.slug / list id pairs
admin_dict = {}

for admin in tqdm(ADMINS):
    admin_dict[admin.slug] = 1

for lead in tqdm(hunter_results):
    initials = lead.account.owner.email.split('@')[0]
    if (admin_dict.get(initials)):
        create_lead(lead, lead_list_ids[initials])

Should be the same logic as your code, but it's lower time complexity. And you are correct that the list comprehension creates another for loop.

answered Feb 10, 2023 at 15:48

JHD-2164

164 bronze badges

1 Comment

Magic-Wike Over a year ago

Thank you! This solution will work and is very similar to the reply from JonSG. I ended up using that as it's a touch simpler and more intuitive to read, but this answer is correct as well and -- I believe -- equally efficient. Appreciate you taking the time.

Collectives™ on Stack Overflow

More efficient way to slice list / remove processed data in Python for loops

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related