This is my first post on SO, so please be easy on me.
In this program, I am trying to change userAgents after a certain number of failed attempts. There is a file of ~10000 userAgents located in the UserAgents.txt file, UTF-8 encoded.
I am writing a program in python that needs to scrape data from a website. I am getting the following error:
ValueError: Invalid header value b'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36\n'
I realize that the 'b' in front of the string means that it is byte encoded. The steps that I have followed include:
-
userAgent = userAgent.encode("UTF-8").decode("UTF-8") -
userAgent = str(userAgent) -
userAgent = userAgentFile.readlines()[0] -
userAgentFile = open("UserAgents.txt", "r", encoding="UTF-8") - I have also tried defining the user agent within the definition for the headers.
userAgentFile = open("UserAgents.txt", "r")
userAgent = userAgentFile.readline()
userAgentFile.close();
headerList = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-US,en;q=0.9",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.6 Safari/525.13",
"X-Amzn-Trace-Id": "Root=1-61be9723-4cd53f9228b4db340a348137"
}
headerList["User-Agent"] = str(userAgent)
#Submit a request to our website, and with our "special" headings.
r = requests.get(f"https://www.reddit.com/r/BreadStapledToTrees/", headers=headerList)
Any help would be appreciated!
--Also, I am not actually scraping from r/BreadStapledToTrees...
headerList. Please provide a few lines from yourUserAgents.txtfile. Why do you need to change user agent?