0

This is my first post on SO, so please be easy on me.
In this program, I am trying to change userAgents after a certain number of failed attempts. There is a file of ~10000 userAgents located in the UserAgents.txt file, UTF-8 encoded.
I am writing a program in python that needs to scrape data from a website. I am getting the following error:

ValueError: Invalid header value b'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36\n'

I realize that the 'b' in front of the string means that it is byte encoded. The steps that I have followed include:

  1. userAgent = userAgent.encode("UTF-8").decode("UTF-8")
  2. userAgent = str(userAgent)
  3. userAgent = userAgentFile.readlines()[0]
  4. userAgentFile = open("UserAgents.txt", "r", encoding="UTF-8")
  5. I have also tried defining the user agent within the definition for the headers.
userAgentFile = open("UserAgents.txt", "r")
userAgent = userAgentFile.readline()
userAgentFile.close();

headerList = {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", 
    "Accept-Encoding": "gzip, deflate", 
    "Accept-Language": "en-US,en;q=0.9", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.2.149.6 Safari/525.13", 
    "X-Amzn-Trace-Id": "Root=1-61be9723-4cd53f9228b4db340a348137"
}

headerList["User-Agent"] = str(userAgent)
#Submit a request to our website, and with our "special" headings.
r = requests.get(f"https://www.reddit.com/r/BreadStapledToTrees/", headers=headerList)

Any help would be appreciated!

--Also, I am not actually scraping from r/BreadStapledToTrees...

2
  • The program runs OK with your original headerList. Please provide a few lines from your UserAgents.txt file. Why do you need to change user agent? Commented Dec 20, 2021 at 4:47
  • I assume the problem is that newline character at the end of the value - you need to strip that off after reading the line from the file. Commented Dec 20, 2021 at 5:41

1 Answer 1

4

From the error message, there is an newline character at the end of the user agent string, so strip it before sending it to requests, by changing line 14 to

headerList["User-Agent"] = userAgent.strip()
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.