0

I'm trying to get text from this site. It is just a simple plain site with only text. When running the code below, the only thing it prints out is a newline. I should say that websites content/text is dynamic, so it changes over a few minutes. My requests module version is 2.27.1. I'm using Python 3.9 on Windows.

What could be the problem?

import requests

url='https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
}

content=requests.get(url, headers=headers)
print(content.text)

This is the example of how the website should look. Website screenshot

3
  • If you were checking the error codes, you'd see that you're getting a "403 Forbidden". What you need to do is provide a fake User-Agent string. Commented Jan 25, 2022 at 20:54
  • @TimRoberts I haven't gotten any errors. Can you provide the example code of how should the fake user agent look Commented Jan 25, 2022 at 20:59
  • It's not the User-Agent. Anon Coward has the right answer below. All you need is the Accept-Encoding header. With that, I can fetch using wget just fine. And, you didn't "see" any error because you weren't LOOKING for errors. requests doesn't raise an exception for HTTP errors. You're expected to check content.response. Commented Jan 26, 2022 at 0:36

2 Answers 2

1

That particular server appears to be gating responses not on the User-Agent, but on the Accept-Encoding settings. You can get a normal response with:

import requests
url = "https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN"
headers = {
    "Accept-Encoding": "gzip, deflate, br",
}
content = requests.get(url, headers=headers)
print(content.text)

Depending on how the server responds over time, you might need to install the brotli package to allow requests to decompress content compressed with it.

Sign up to request clarification or add additional context in comments.

1 Comment

Most of the time I get normal response, but sometimes I just get Response 403. How do I utilize brotli to fix this?
0

You just need to add user-agent like below.

import requests

url = "https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN"

payload={}
headers = {
    'User-Agent': 'PostmanRuntime/7.29.0',
    'Accept': '*/*',
    'Cache-Control': 'no-cache',
    'Host': 'www.spaceweatherlive.com',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive'
 }
response = requests.get(url, headers=headers)
print(response.text)

7 Comments

I still get only newline printed
strange, i get the output for me. can you check you are getting 403 when you print(content)?
Yes, I get <Response [403]>
@DinoGržinić can you please paste and update the question along with user-agent part. so I can run that also and see what's causing this.
@shivankgtml I updated it now. For the User Agent value I just googled "my user agent".
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.