4

Please do not close this question - this is not a duplicate. I need to click the button using Python requests, not Selenium, as here

I am trying to scrape Reverso Context translation examples page. And I have a problem: I can get only 20 examples and then I need to click the "Display more examples" button lots of times while it exists on the page to get the full results list. It can simply be done using a web browser, but how can I do it with Python Requests library?

I looked at the button's HTML code, but I couldn't find an onclick attribute to look at JS script attached to it, and I don't understand what request I need to send:

<button id="load-more-examples" class="button load-more " data-default-size="14px">Display more examples</button>

And here is my Python code:

from bs4 import BeautifulSoup
import requests
import re


with requests.Session() as session:  # Create a Session
    # Log in
    login_url = 'https://account.reverso.net/login/context.reverso.net/it?utm_source=contextweb&utm_medium=usertopmenu&utm_campaign=login'
    session.post(login_url, "[email protected]&Password=sample",
           headers={"User-Agent": "Mozilla/5.0", "content-type": "application/x-www-form-urlencoded"})

    # Get the HTML
    html_text = session.get("https://context.reverso.net/translation/russian-english/cat", headers={"User-Agent": "Mozilla/5.0"}).content

    # And scrape it
    for word_pair in BeautifulSoup(html_text).find_all("div", id=re.compile("^OPENSUBTITLES")):
        print(word_pair.find("div", class_="src ltr").text.strip(), "=", word_pair.find("div", class_="trg ltr").text.strip())

Note: you need to log in, otherwise it will show only first 10 examples and will not show the button. You may use this real authentication data:
E-mail: [email protected]
Password: sample

17
  • 1
    Does this answer your question? invoking onclick event with beautifulsoup python Commented Feb 22, 2020 at 14:30
  • 1
    you can't do that with requests stackoverflow.com/a/37167063/8619959 Commented Feb 22, 2020 at 14:37
  • 1
    It can simply be done using a web browser, but how can I do it with Python Requests library? You can't, Requests does not execute JavaScript or anything like that. Commented Feb 22, 2020 at 16:39
  • 1
    Does this answer your question? "Clicking" button with requests Commented Feb 22, 2020 at 16:40
  • 2
    in some cases (like this one, for example), it is possible to explore the browser's behavior in more detail: get the requests it sends and try to do the same thing using Python requests Of course, but I wouldn't call that simulating a button press. Commented Feb 22, 2020 at 20:01

1 Answer 1

6

Here is a solution that gets all the example sentences using requests and removes all the HTML tags from them using BeautifulSoup:

from bs4 import BeautifulSoup
import requests
import json


headers = {
    "Connection": "keep-alive",
    "Accept": "application/json, text/javascript, */*; q=0.01",
    "X-Requested-With": "XMLHttpRequest",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
    "Content-Type": "application/json; charset=UTF-8",
    "Content-Length": "96",
    "Origin": "https://context.reverso.net",
    "Sec-Fetch-Site": "same-origin",
    "Sec-Fetch-Mode": "cors",
    "Referer": "https://context.reverso.net/^%^D0^%^BF^%^D0^%^B5^%^D1^%^80^%^D0^%^B5^%^D0^%^B2^%^D0^%^BE^%^D0^%^B4/^%^D0^%^B0^%^D0^%^BD^%^D0^%^B3^%^D0^%^BB^%^D0^%^B8^%^D0^%^B9^%^D1^%^81^%^D0^%^BA^%^D0^%^B8^%^D0^%^B9-^%^D1^%^80^%^D1^%^83^%^D1^%^81^%^D1^%^81^%^D0^%^BA^%^D0^%^B8^%^D0^%^B9/cat",
    "Accept-Encoding": "gzip, deflate, br",
    "Accept-Language": "ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7",
}

data = {
    "source_text": "cat",
    "target_text": "",
    "source_lang": "en",
    "target_lang": "ru",
    "npage": 1,
    "mode": 0
}

npages = requests.post("https://context.reverso.net/bst-query-service", headers=headers, data=json.dumps(data)).json()["npages"]
for npage in range(1, npages + 1):
    data["npage"] = npage
    page = requests.post("https://context.reverso.net/bst-query-service", headers=headers, data=json.dumps(data)).json()["list"]
    for word in page:
        print(BeautifulSoup(word["s_text"]).text, "=", BeautifulSoup(word["t_text"]).text)

At first, I got the request from the Google Chrome DevTools:

  1. Pressed F12 key to enter it and selected the Network Tab
  2. Clicked the "Display more examples" button
  3. Found the last request ("bst-query-service")
  4. Right-clicked it and selected Copy > Copy as cURL (cmd)

Then, I opened this online-tool, insert the copied cURL to the textbox on the left and copied the output on the right (use Ctrl-C hotkey for this, otherwise it may not work).

After that I inserted it to the IDE and:

  1. Removed the cookies dict - it is not necessary here
  2. Important: Rewrote the data string as a Python dictionary and wrapped it with json.dumps(data), otherwise, it returned a request with empty words list.
  3. Added a script, that: gets a number of times to fetch the words ("pages") and created a for loop that gets words this number of times and prints them without HTML tags (using BeautifulSoup)

UPD:
For those, who visited the question to learn how to work with Reverso Context (not just to simulate a button click request on other website) there is a Python wrapper for Reverso API released: Reverso-API. It can do the same thing as above but much simpler:

from reverso_api.context import ReversoContextAPI


api = ReversoContextAPI("cat", "", "en", "ru")
for source, target in api.get_examples_pair_by_pair():
    print(highlight_example(source.text), "==", highlight_example(target.text))
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.