12

I'm using selenium with Python 2.7. to retrieve the contents from a search box on a webpage. The search box dynamically retrieves and displays the results in the box itself.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
import re
from time import sleep

driver = webdriver.Firefox()
driver.get(url)

df = pd.read_csv("read.csv")

def crawl(isin):
    searchkey = driver.find_element_by_name("searchkey")
    searchkey.clear()
    searchkey.send_keys(isin)
    sleep(11)

    search_result = driver.find_element_by_class_name("ac_results")
    names = re.match(r"^.*(?=(\())", search_result.text).group().encode("utf-8")
    product_id = re.findall(r"((?<=\()[0-9]*)", search_result.text)
    return pd.Series([product_id, names])

df[["insref", "name"]] = df["ISIN"].apply(crawl)

print df

Relevant part of the code may be found under def crawl(isin):

  • The program enters what to search for in the search box (the box is badly named as searchkey).
  • It then does sleep() and waits for the content to show in the search box dropdown field ac_results.
  • Then gets two variables insrefs and names with Regex.

Instead of calling sleep(), I would like for it to wait for the content in the WebElement ac_results to load.

Since it will continuously use the search box to get new data by entering new search terms from a list, one could perhaps use Regex to identify when there is new content in ac_results that is not identical to the previous content.

Is there a method for this? It is important to note that the content in the search box is dynamically loaded, so the function would have to recognise that something has changed in the WebElement.

3 Answers 3

23
+50

You need to apply the Explicit Wait concept. E.g. wait for an element to become visible:

wait = WebDriverWait(driver, 10)
wait.until(EC.visibility_of_element_located((By.CLASS_NAME, 'searchbox')))

Here, it would wait up to 10 seconds checking the visibility of the element every 500 ms.

There is a set of built-in Expected Conditions to wait for and it is also easy to write your custom Expected Condition.


FYI, here is how we approached it after brainstorming it in the chat. We've introduced a custom Expected Condition that would wait for the element text to change. It helped us to identify when the new search results appear:

import re

import pandas as pd
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import _find_element

class text_to_change(object):
    def __init__(self, locator, text):
        self.locator = locator
        self.text = text

    def __call__(self, driver):
        actual_text = _find_element(driver, self.locator).text
        return actual_text != self.text

#Load URL
driver = webdriver.Firefox()
driver.get(url)

#Load DataFrame of terms to search for
df = pd.read_csv("searchkey.csv")

#Crawling function    
def crawl(searchkey):
    try: 
        text_before = driver.find_element_by_class_name("ac_results").text 
    except NoSuchElementException: 
        text_before = ""

    searchbox = driver.find_element_by_name("searchbox")
    searchbox.clear()
    searchbox.send_keys(searchkey)
    print "\nSearching for %s ..." % searchkey

    WebDriverWait(driver, 10).until(
        text_to_change((By.CLASS_NAME, "ac_results"), text_before)
    )

    search_result = driver.find_element_by_class_name("ac_results")
    if search_result.text != "none":
        names = re.match(r"^.*(?=(\())", search_result.text).group().encode("utf-8")
        insrefs = re.findall(r"((?<=\()[0-9]*)", search_result.text)
    if search_result.text == "none":
        names = re.match(r"^.*(?=(\())", search_result.text)
        insrefs = re.findall(r"((?<=\()[0-9]*)", search_result.text)
    return pd.Series([insrefs, names])

#Run crawl    
df[["Insref", "Name"]] = df["ISIN"].apply(crawl)

#Print DataFrame    
print df
Sign up to request clarification or add additional context in comments.

13 Comments

It's not quite that easy, because the searchbox element will load instantly when opening the page. It's when I enter the searchkey into the element, it'll take up to 8-9 s until the text content in the element loads. It is that content that I would like to wait for.
@Winterflags yeah, I've just provided an example and a hint :) text_to_be_present_in_element is probably a good candidate in your case if you know which text to wait for. If not, then you would need a custom expected condition.
Thank you very much, I followed your custom expected condition for regex as seen here: stackoverflow.com/questions/28240342/…. I managed to get it waiting for the first reply that matches a pattern, but once that pattern it present it continues to run the loop searching for all searchkeys but not giving it time to conjure them. Do you have any ideas for how to make it wait for new content that matches the same pattern but is different?
The pattern looks like this: "Name ABC123 (01234)", "Something 123DEF (432134)", "Somethingsomething 123 GHI (07451)". What is constant is that there is text followed by a series of numbers of variable length within parenthesis in the end.
Solved this thanks to alecxe in chat! Super helpful. The custom expected condition above will prove useful for Selenium users waiting for dynamic text content to appear in WebElement.
|
1

I suggest using the below Expected Condition in WebDriverWait.

WebDriverWait(driver, 10).until(
    text_to_be_present_in_element((By.CLASS_NAME, "searchbox"), r"((?<=\()[0-9]*)")
)

or

WebDriverWait(driver, 10).until(
    text_to_be_present_in_element_value((By.CLASS_NAME, "searchbox"), r"((?<=\()[0-9]*)")
)

3 Comments

If I'm not mistaken that will indeed wait for the very first text reply in the search box to be loaded. But if the program inserts a new searchterm right afterwards, it will recognize the pattern from the first result and not wait for the second result to load. See my explanation under "What the code does now" in OP.
WebDriverWait, that we are using is example of Explicit wait that means we need to set the wait before every element-find. That's why we use implicit wait in start that will set wait for every element-find.
I believe the best is to use sleep here or write a function to wait for JQuery calls to complete.
1

create class for wait condition

class SubmitChanged(object):
    def __init__(self, element):
        self.element = element

    def __call__(self, driver):
        # here we check if this is new instance of element
        new_element = driver.find_element_by_xpath('<your xpath>')
        return new_element != self.element

in your program call it

     wait = WebDriverWait(<driver object>, 3)
     wait.until(SubmitChanged(<web element>))

more info at https://selenium-python.readthedocs.io/waits.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.