Can't get data from site using requests in Python

Question

I'm trying to get text from this site. It is just a simple plain site with only text. When running the code below, the only thing it prints out is a newline. I should say that websites content/text is dynamic, so it changes over a few minutes. My requests module version is 2.27.1. I'm using Python 3.9 on Windows.

What could be the problem?

import requests

url='https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.99 Safari/537.36',
}

content=requests.get(url, headers=headers)
print(content.text)

This is the example of how the website should look.

If you were checking the error codes, you'd see that you're getting a "403 Forbidden". What you need to do is provide a fake User-Agent string. — Tim Roberts
– Tim Roberts, Commented Jan 25, 2022 at 20:54
@TimRoberts I haven't gotten any errors. Can you provide the example code of how should the fake user agent look — Helix
– Helix, Commented Jan 25, 2022 at 20:59
It's not the User-Agent. Anon Coward has the right answer below. All you need is the Accept-Encoding header. With that, I can fetch using wget just fine. And, you didn't "see" any error because you weren't LOOKING for errors. requests doesn't raise an exception for HTTP errors. You're expected to check content.response. — Tim Roberts
– Tim Roberts, Commented Jan 26, 2022 at 0:36

Anon Coward · Accepted Answer · 2022-01-25 22:25:33Z

1

That particular server appears to be gating responses not on the User-Agent, but on the Accept-Encoding settings. You can get a normal response with:

import requests
url = "https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN"
headers = {
    "Accept-Encoding": "gzip, deflate, br",
}
content = requests.get(url, headers=headers)
print(content.text)

Depending on how the server responds over time, you might need to install the brotli package to allow requests to decompress content compressed with it.

answered Jan 25, 2022 at 22:25

Anon Coward

10.9k3 gold badges31 silver badges42 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Helix Over a year ago

Most of the time I get normal response, but sometimes I just get Response 403. How do I utilize brotli to fix this?

shivankgtm · Accepted Answer · 2022-01-25 22:10:39Z

0

You just need to add user-agent like below.

import requests

url = "https://www.spaceweatherlive.com/includes/live-data.php?object=solar_flare&lang=EN"

payload={}
headers = {
    'User-Agent': 'PostmanRuntime/7.29.0',
    'Accept': '*/*',
    'Cache-Control': 'no-cache',
    'Host': 'www.spaceweatherlive.com',
    'Accept-Encoding': 'gzip, deflate, br',
    'Connection': 'keep-alive'
 }
response = requests.get(url, headers=headers)
print(response.text)

edited Jan 25, 2022 at 22:10

answered Jan 25, 2022 at 21:05

shivankgtm

1,2421 gold badge10 silver badges21 bronze badges

7 Comments

Helix Over a year ago

I still get only newline printed

shivankgtm Over a year ago

strange, i get the output for me. can you check you are getting 403 when you print(content)?

Helix Over a year ago

Yes, I get <Response [403]>

shivankgtm Over a year ago

@DinoGržinić can you please paste and update the question along with user-agent part. so I can run that also and see what's causing this.

Helix Over a year ago

@shivankgtml I updated it now. For the User Agent value I just googled "my user agent".

|

Collectives™ on Stack Overflow

Can't get data from site using requests in Python

2 Answers 2

1 Comment

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related