0

I am looking for a better way to scrape the latest exchange rate from https://www.remitly.com/us/en/india

With the current code below I get 16 instances of 'script' and then going through each one of them and looking if they contain the exchange rate is one way to do it. Is there a better way?

The issue here is that I cannot use additional attributes with soup.find_all(). Also the array elements are too large.

# get current exchange rate

import bs4 as bs
import urllib.request
import parser
from pprint import pprint

source = urllib.request.urlopen('https://www.remitly.com/us/en/india')
soup = bs.BeautifulSoup(source,'lxml')

#js_test = soup.findAll('td', class_='f1smo2ix')
cost = soup.find_all('script')

print(cost)
print(len(cost))

3 Answers 3

2

Solution with BeautifulSoup, you can use .find_next_sibling(text=True) to get the rate:

import requests
from bs4 import BeautifulSoup

url = 'https://www.remitly.com/us/en/india'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

print( soup.select_one('sup:contains("₹")').find_next_sibling(text=True) )

Prints:

75.55
Sign up to request clarification or add additional context in comments.

Comments

1

I think best way to achieve is using xpath. You use a query like //sup[text() = '₹'] to locate <sup>elements that have the text content . After you located it, get the text in parent. Here is a working sample for your situation:

import urllib.request
from lxml import etree

response = urllib.request.urlopen('https://www.remitly.com/us/en/india')
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)

rate_tree = tree.xpath("//sup[text() = '₹']")[0].getparent()
etree.strip_elements(rate_tree, 'sup', with_tail=False)
rate = rate_tree.text

print(rate)

Comments

0

I ended up scraping <script> \__REMITLY_LANDING_PAGE_CONTEXT__ = { \** *JSON OBJECT HERE* ** } </script>

The JSON object provided several additional details that is easy to access. Below is the code:

# get current exchange rate

import bs4 as bs
import urllib.request
import re
import json

url = 'https://www.remitly.com/us/en/india'

source = urllib.request.urlopen(url)
soup = bs.BeautifulSoup(source,'lxml')

script = soup.find('script', text=re.compile('__REMITLY_LANDING_PAGE_CONTEXT__'))

nextsc = script.next.strip('__REMITLY_LANDING_PAGE_CONTEXT__ = ')

json_obj = json.loads(nextsc)

economy = json_obj['context']['forex']['current']['economy']['everyday']
print("Economy rate 1 USD is " + economy + " INR.")

express = json_obj['context']['forex']['current']['express']['everyday']
print("Express rate 1 USD is " + express + " INR.")

special = json_obj['context']['forex']['current']['express']['effective']
print("Special rate for first time senders 1 USD is " + special + " INR.")

Thanks to @andrej-kesely and @dorukerenaktas for their answers which let me ponder more on this topic.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.