2

Please do not mark it as a trivial duplicate of: Python requests giving errror: IndexError: list index out of range

I am encountering the said error when trying to send a GET to:

http://astat.bugly.qq.com

The exact line that I use in the code that gives me the request is, maybe I am overlooking something:

r = requests.get("http://"+url, stream=True, timeout=3)

where astat.bugly.qq.com is passed as url

I am reading the url out of a csv-file but I checked that it reads the appropriate url.

I can even recreate it multiple times the python-REPL, I hope you can too. Proof: enter image description here

I dont know what could be causing this.

6
  • 1
    Can't replicate your issue - what version of requests do you have installed? Commented Aug 27, 2017 at 14:05
  • 1
    I can replicate it with Python 2.7.13, requests 2.18.3, on win10. Commented Aug 27, 2017 at 14:18
  • 1
    .. and also with requests 2.18.4. Commented Aug 27, 2017 at 14:19
  • 1
    Seems to be an issue with the urllib3 in the python2 version when malformed headers are being returned. It is a known bug: github.com/shazow/urllib3/issues/950 Commented Aug 27, 2017 at 14:43
  • 2
    Note: the response from this webserver is not a valid HTTP response, no status line. It should have at a minimum: HTTP/1.0 200 OK\n\nlb monitor page but would also expect a content-type="text/plain". Fix the response from the webserver to be valid HTTP or you will not be able to rely on any http library (it might work today but may break tomorrow). Commented Aug 27, 2017 at 14:54

1 Answer 1

2

The url in question is a plain text document, which is sent without any headers.

The requests library seems to insist on there being at least one header (it's doing headers[-1] when you get the index error).

I'm not sure if you can do anything in requests to work around it, but you could use urllib:

import urllib
txt = urllib.urlopen('http://astat.bugly.qq.com').read()

update:

my goal is to read certain headers from a website, how would you suggest I do that considering the situation ?

The server sends the following:

~$ telnet astat.bugly.qq.com 80
Trying 103.7.30.121...
Connected to astat-bugly-tgw4l.mig.tencent-cloud.net.
Escape character is '^]'.
GET / HTTP/1.1
HTTP/1.1 200 OK
                Date: Thu, 30 Aug 2012 12:01:27 GMT
                Server: Apache/2.2.17 (Unix) PHP/5.3.6
                Cache-Control: max-age=1800
                Keep-Alive: timeout=5, max=98
                Connection: Keep-Alive
                Transfer-Encoding: chunked
                Content-Type: text/plain; charset=UTF-8
                Copyright @ 1998 - 2012 Tencent. All Rights Reserved.
                Content-Length: 17

lb monitor page
Connection closed by foreign host.

short of using telnet (or something similarly low level), I don't think you'll have much luck getting at those "headers" (which should of course not be indented, or have a random copyright string embedded).

Sign up to request clarification or add additional context in comments.

5 Comments

It's the underlying urllib3 library - known issue: github.com/shazow/urllib3/issues/950
The response is also being sent without a standard HTTP status response - so is not valid HTTP. So while urllib might work, you can't rely on it if you don't have a valid HTTP response.
@AChampion I agree, if you control the server, then fixing it is the obviously correct thing to do.
my goal is to read certain headers from a website, how would you suggest I do that considering the situation ?
see my update, I think i solved it, by using python3 instead of my default python2.7 installation, I will update if any issues arise

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.