Load multiple URLs at a time in Selenium

Question

I have a code that returns the title of a list of URLs. Since I have to wait for a loaded URL to update before the title is returned, I'm wondering if there's a way to load more than one URL at a time and return both titles at once.

This is the code:

from pyvirtualdisplay import Display
from time import sleep
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.firefox.options import Options
display = Display(visible=0, size(800,600))
display.start()
urlsFile = open ("urls.txt", "r")
urls = urlsFile.readLines()
driver = webdriver.Firefox(executable_path='/usr/local/lib/geckodriver/geckodriver')
driver.set_page_load_timeout(60)
for url in urls:
        try:
           driver.get(url)
           sleep(0.8)
           print(driver.title)
        except TimeoutException as e:
           print("Timeout")

If I try to do this:

driver = webdriver.Firefox(executable_path='/usr/local/lib/geckodriver/geckodriver')
driver2 = webdriver.Firefox(executable_path='/usr/local/lib/geckodriver/geckodriver')
for url in urls:
        try:
           driver.get(url)
           driver2.get(url)
           sleep(0.8)
           print(driver.title)
           print(driver2.title)
        except TimeoutException as e:
           print("Timeout")

The URL that driver2 gets is the same one that driver1 gets. Is it possible to have driver2 get the URL next in line, to load both of them like that without losing time?

Danielle M. · Accepted Answer · 2019-04-02 03:05:50Z

1

from multiprocessing.pool import Pool


# read URLs into list `urls`
with open("urls.txt", "r") as urlsFile:
    urls = urlsFile.readlines()


# a function to process a single URL
def my_url_function(url):
    # each proc uses it's own driver
    driver = webdriver.Firefox(executable_path='/usr/local/lib/geckodriver/geckodriver')
    driver.get(url)
    print("Got {}".format(url))


# a multiprocessing pool with 2 threads
pool = Pool(processes=2)
map_results_list = pool.map(my_url_function, urls)

print(map_results_list)

This example uses python's multiprocessing module to actually process 2 URLs at the same time - although you can change the number of processes when you set up the pool, of course.

The pool.map() functions take a function and a list, and iterates over the list, sending each item to the function, and running each function call in it's own process.

Change the my_url_function() function to do what you actually want, but don't share resources in multiprocess functions - have each function generate it's own driver, and anything else your function might need. Some things can be shared across concurrent functions, but it's safer to share nothing at all.

answered Apr 2, 2019 at 3:05

Danielle M.

3,6901 gold badge16 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

WeekSky Over a year ago

When I do this the wait for the title to be generated with Javascript doesn't work, so the titles are always "Loading..." Also, there's a very long wait before the pool repeats.

Collectives™ on Stack Overflow

Load multiple URLs at a time in Selenium

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related