1

I have a code that returns the title of a list of URLs. Since I have to wait for a loaded URL to update before the title is returned, I'm wondering if there's a way to load more than one URL at a time and return both titles at once.

This is the code:

from pyvirtualdisplay import Display
from time import sleep
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.firefox.options import Options
display = Display(visible=0, size(800,600))
display.start()
urlsFile = open ("urls.txt", "r")
urls = urlsFile.readLines()
driver = webdriver.Firefox(executable_path='/usr/local/lib/geckodriver/geckodriver')
driver.set_page_load_timeout(60)
for url in urls:
        try:
           driver.get(url)
           sleep(0.8)
           print(driver.title)
        except TimeoutException as e:
           print("Timeout")

If I try to do this:

driver = webdriver.Firefox(executable_path='/usr/local/lib/geckodriver/geckodriver')
driver2 = webdriver.Firefox(executable_path='/usr/local/lib/geckodriver/geckodriver')
for url in urls:
        try:
           driver.get(url)
           driver2.get(url)
           sleep(0.8)
           print(driver.title)
           print(driver2.title)
        except TimeoutException as e:
           print("Timeout")

The URL that driver2 gets is the same one that driver1 gets. Is it possible to have driver2 get the URL next in line, to load both of them like that without losing time?

1 Answer 1

1
from multiprocessing.pool import Pool


# read URLs into list `urls`
with open("urls.txt", "r") as urlsFile:
    urls = urlsFile.readlines()


# a function to process a single URL
def my_url_function(url):
    # each proc uses it's own driver
    driver = webdriver.Firefox(executable_path='/usr/local/lib/geckodriver/geckodriver')
    driver.get(url)
    print("Got {}".format(url))


# a multiprocessing pool with 2 threads
pool = Pool(processes=2)
map_results_list = pool.map(my_url_function, urls)

print(map_results_list)

This example uses python's multiprocessing module to actually process 2 URLs at the same time - although you can change the number of processes when you set up the pool, of course.

The pool.map() functions take a function and a list, and iterates over the list, sending each item to the function, and running each function call in it's own process.

Change the my_url_function() function to do what you actually want, but don't share resources in multiprocess functions - have each function generate it's own driver, and anything else your function might need. Some things can be shared across concurrent functions, but it's safer to share nothing at all.

Sign up to request clarification or add additional context in comments.

1 Comment

When I do this the wait for the title to be generated with Javascript doesn't work, so the titles are always "Loading..." Also, there's a very long wait before the pool repeats.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.