I want to work with this page in Python: http://www.sothebys.com/en/search-results.html?keyword=degas%27
This is my code:
from bs4 import BeautifulSoup
import requests
page = requests.get('http://www.sothebys.com/en/search-results.html?keyword=degas%27')
soup = BeautifulSoup(page.content, "lxml")
print(soup)
I'm getting following output:
<html><head>
<title>Invalid URL</title>
</head><body>
<h1>Invalid URL</h1>
The requested URL "[no URL]", is invalid.<p>
Reference #9.8f4f1502.1494363829.5fae0e0e
</p></body></html>
I can open the page with my browser from the same machine and don't get any error message. When I use the same code with another URL the correct HTML content is fetched:
from bs4 import BeautifulSoup
import requests
page = requests.get('http://www.christies.com/lotfinder/searchresults.aspx?&searchtype=p&action=search&searchFrom=header&lid=1&entry=degas')
soup = BeautifulSoup(page.content, "lxml")
print(soup)
I also tested other URLs (reddit, google, ecommerce sites) and didn't encounter any issue. So, the same code works with one URL and with another one not. Where is the problem?
soup = BeautifulSoup(page.text, "lxml")in place of what you have. BeautifulSoup expects a string.page.contentgives a byte array.