1

I'm trying to get the hours of the available time slots from this webpage (the boxes below the calendar):

https://magicescape.it/le-stanze/lo-studio-di-harry-houdini/

I've read other related questions and wrote this code

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support.expected_conditions import presence_of_element_located
from selenium.webdriver.firefox.options import Options
from bs4 import BeautifulSoup

url = 'https://magicescape.it/le-stanze/lo-studio-di-harry-houdini/'
wait_time = 10
options = Options()
options.headless = True

driver = webdriver.Firefox(options=options)
driver.get(url)
driver.switch_to.frame(0)

wait = WebDriverWait(driver, wait_time)
first_result = wait.until(presence_of_element_located((By.ID, "sb_main")))

soup = BeautifulSoup(driver.page_source, 'html.parser')
print(soup)

driver.quit()

After switching to the iframe containing the time slots, I get this from printing soup

<script id="time_slots_view" type="text/html"><div class="slots-view{{#ifCond (getThemeOption 'timeline_modern_display') '==' 'as_table'}} as-table{{/ifCond}}">
    <div class="timeline-wrapper">
        <div class="tab-pd">
            <div class="container-caption">
                {{_t 'available_services_on_this_day'}}
            </div>

            {{#if error_message}}
                <div class="alert alert-danger alert-dismissible" role="alert">
                    {{error_message}}
                </div>
            {{/if}}

            {{>emptyTimePart is_empty=is_empty is_loaded=is_loaded}}

            <div id="sb_time_slots_container"></div>
            {{> bookingTimeLegendPart legend="only_available" time_diff=0}}
        </div>
    </div>
</div></script>
<script id="time_slot_view" type="text/html"><div class="slot">
    <a class="sb-cell free {{#ifPluginActive 'slots_count'}}{{#if available_slots}}has-available-slot{{/if}}{{/ifPluginActive}}" href="#{{bookingStepUrl time=time date=date}}">
        {{formatDateTime datetime 'time' time_diff}}

        {{#ifCond (getThemeOption 'timeline_show_end_time') '==' 1}}
            -<span class="end-time">
                &nbsp;{{formatDateTime end_datetime 'time' time_diff}}
            </span>
        {{/ifCond}}

        {{#ifPluginActive 'slots_count'}}
            {{#if available_slots}}
                <span class="slot--available-slot">
                    {{available_slots}}
                    {{#ifConfigParam 'slots_count_show_total' '==' true}} / {{total_slots}} {{/ifConfigParam}}
                </span>
            {{/if}}
        {{/ifPluginActive}}
    </a>
</div></script>

while from right click > inspect element in the webpage I get this

<div class="slots-view">
  <div class="timeline-wrapper">
    <div class="tab-pd">
      <div class="container-caption">
        Orari d'inizio disponibili
      </div>
      <div id="sb_time_slots_container">
        <div class="slot">
          <a class="sb-cell free " href="#book/location/4/service/6/count/1/provider/6/date/2020-03-09/time/23:00:00/">
            23:00
          </a>
        </div>
      </div>
      <div class="time-legend">
        <div class="available">
          <div class="circle">
          </div>
          - Disponibile
        </div>
      </div>
    </div>
  </div>
</div>

How can I get the hour of the available slots (23:00 in this example) using selenium?

1 Answer 1

1
+50

To get the desired response you need to:

  1. Correctly identify the iframe you want to switch to (and switch to it). You were trying to switch to frame[0] but needed frame[1]. The following code removes reliance on indexes and uses xpath instead.
  2. Get the elements containing the time. Again this uses xpath to identify all child div's of an element with id=sb_time_slots_container.
  3. We then iterate over these child div's and get the text property, which is nested within an <a> of these div's.

For both steps 1 & 2 you should also use wait.until so that the content can be loaded.

...
driver.get(url)
wait = WebDriverWait(driver, wait_time)

# Wait until the iframe exists then switch to it
iframe_element = wait.until(presence_of_element_located((By.XPATH, '//*[@id="prenota"]//iframe')))
driver.switch_to.frame(iframe_element)

# Wait until the times exist then get an array of them
wait.until(presence_of_element_located((By.XPATH, '//*[@id="sb_time_slots_container"]/div')))
all_time_elems = driver.find_elements_by_xpath('//*[@id="sb_time_slots_container"]/div')

# Iterate over each element and print the time out
for elem in all_time_elems:
    print(elem.find_element_by_tag_name("a").text)

driver.quit()
Sign up to request clarification or add additional context in comments.

6 Comments

And this gets around the fact it's all pulled as scripts?
Yes, the script is injecting html into the rendered page. The wait.until's ensure that the content has loaded before trying to read it. You could add a try/catch for selenium.common.exceptions.TimeoutException, to handle cases where the content wasn't found in the specified time (which is 10secs in the OPs code).
Thanks. I assume this isn't possible just using requests?
Thanks, that worked! What if there was more than a single <a> inside a <div> instead?
@Alistair, no the waits are crucial. If you're only getting a single element, the wait.until can return it. If you're getting an array of elements you'll need to use wait.until then find_elements_by.... Example usages in the above code are getting the frame(single) vs getting the time divs(multiple)
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.