How to scrape a webpage with dynamic script using BeautifulSoup?

Question

I am looking for a better way to scrape the latest exchange rate from https://www.remitly.com/us/en/india

With the current code below I get 16 instances of 'script' and then going through each one of them and looking if they contain the exchange rate is one way to do it. Is there a better way?

The issue here is that I cannot use additional attributes with soup.find_all(). Also the array elements are too large.

# get current exchange rate

import bs4 as bs
import urllib.request
import parser
from pprint import pprint

source = urllib.request.urlopen('https://www.remitly.com/us/en/india')
soup = bs.BeautifulSoup(source,'lxml')

#js_test = soup.findAll('td', class_='f1smo2ix')
cost = soup.find_all('script')

print(cost)
print(len(cost))

Andrej Kesely · Accepted Answer · 2020-06-05 07:34:18Z

2

Solution with BeautifulSoup, you can use .find_next_sibling(text=True) to get the rate:

import requests
from bs4 import BeautifulSoup

url = 'https://www.remitly.com/us/en/india'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

print( soup.select_one('sup:contains("₹")').find_next_sibling(text=True) )

Prints:

75.55

answered Jun 5, 2020 at 7:34

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Doruk Eren Aktaş · Accepted Answer · 2020-06-05 05:13:33Z

1

I think best way to achieve is using xpath. You use a query like //sup[text() = '₹'] to locate <sup>elements that have the text content ₹. After you located it, get the text in parent. Here is a working sample for your situation:

import urllib.request
from lxml import etree

response = urllib.request.urlopen('https://www.remitly.com/us/en/india')
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)

rate_tree = tree.xpath("//sup[text() = '₹']")[0].getparent()
etree.strip_elements(rate_tree, 'sup', with_tail=False)
rate = rate_tree.text

print(rate)

answered Jun 5, 2020 at 5:13

Doruk Eren Aktaş

2,36711 silver badges23 bronze badges

Comments

Rajesh · Accepted Answer · 2020-06-06 02:24:10Z

I ended up scraping <script> \__REMITLY_LANDING_PAGE_CONTEXT__ = { \** *JSON OBJECT HERE* ** } </script>

The JSON object provided several additional details that is easy to access. Below is the code:

# get current exchange rate

import bs4 as bs
import urllib.request
import re
import json

url = 'https://www.remitly.com/us/en/india'

source = urllib.request.urlopen(url)
soup = bs.BeautifulSoup(source,'lxml')

script = soup.find('script', text=re.compile('__REMITLY_LANDING_PAGE_CONTEXT__'))

nextsc = script.next.strip('__REMITLY_LANDING_PAGE_CONTEXT__ = ')

json_obj = json.loads(nextsc)

economy = json_obj['context']['forex']['current']['economy']['everyday']
print("Economy rate 1 USD is " + economy + " INR.")

express = json_obj['context']['forex']['current']['express']['everyday']
print("Express rate 1 USD is " + express + " INR.")

special = json_obj['context']['forex']['current']['express']['effective']
print("Special rate for first time senders 1 USD is " + special + " INR.")

Thanks to @andrej-kesely and @dorukerenaktas for their answers which let me ponder more on this topic.

Collectives™ on Stack Overflow

How to scrape a webpage with dynamic script using BeautifulSoup?

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related