0

I am using Python and Selenium to navigate a website.

On a page, I am trying to work my way through a series of 5 dropdown boxes. The options in each dropdown box are generated dynamically, based on what is selected from the previous dropdown(s).

I am stuck on the third dropdown, in which the user must choose a state. When loaded, the inspected HTML looks like:

<select name="state" class="pulldown"  id="state" onchange="[javablob]">
<option value="">Select a State</option>
<option value='AK_N'>               AK</option>
<option value='AL_N'>               AL</option>
<option value='AR_Y'>               AR</option>

...and so on.

My code thus far is:

waitforstate = WebDriverWait(driver,30).until(EC.presence_of_element_located((By.ID,"state")))
driver.implicitly_wait(10)  #added because the ID is found but the states aren't loaded yet
state = Select(driver.find_element_by_id('state'))

But choosing the state I want doesn't work:

state.select_by_visible_text("TN")

gives...

Message: Given xpath expression ".//option[normalize-space(.) = "TN"]" is invalid: 
WrongDocumentError: Node cannot be used in a document other than the 
one in which it was created

Doing this:

state.select_by_value("TN_Y")

gives...

Message: Given css selector expression "option[value ="TN_Y"]" is invalid: 
TypeError: can't access dead object

There is no index to select a state from.

When I try to show what options ARE loaded:

all_options = state.options
for option in all_options:
    print("Value is: %s" % option.get_attribute("value"))

...nothing prints, not even the default option. But it appears I can select and unselect the default option, using this:

state.select_by_visible_text("Select a State")
print "Select a state selected"
state._unsetSelected
print "Now it's unselected"

...which runs without errors.

I used Firefox's Selenium IDE to navigate the page, to see how it was handled, and it was able to select it with id=state, label=TN.

What am I missing?

1 Answer 1

0

I find phantomjs or other webkit libraries more useful when scraping javascript-rendered pages. There ability to fully recreate web-browser interaction makes it more predictable/easy to implement scrapers.

I personally like to use selenium and phantomjs together for scraping purposes.

phantomjs: https://realpython.com/blog/python/headless-selenium-testing-with-python-and-phantomjs/

python-qt4: https://impythonist.wordpress.com/2015/01/06/ultimate-guide-for-scraping-javascript-rendered-web-pages/

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you. I don't see how these tools adds functionality in this situation -- my problem is not with the scraping per se, it's navigating to the scraped pages, which require completing the dropdown forms.
I suspect the dropdown list of states are created with javascript. As I said, javascript rendered elements has to be read differently with selenium. Check this post out: stackoverflow.com/a/22739613/5808505
Thank you. That method generates the same HTML, in this particular instance, as the method I was using. As it turns out, my error was not waiting long enough for the full javascript-generated option list to load. driver.implicitly_wait(10) did not wait adequately, but sleep(5) did.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.