10

I want to crawl a site, however cloudflare was getting in the way. I was able to get the servers IP, so cloudflare won't bother me.

How can I utilize this in the requests library?

For example, I want to go directly to www.example.com/foo.php, but in requests it will resolve the IP on the cloudflare network instead of the one I want it to use. How can I make it use the one I want it to use?

I would of sent in a request so the real IP with the host set as the www.example.com, but that will just give me the home page. How can I visit other links on the site?

4 Answers 4

24

You will have to set a custom header host with value of example.com, something like:

requests.get('http://127.0.0.1/foo.php', headers={'host': 'example.com'})

should do the trick. If you want to verify that then type in the following command (requires netcat): nc -l -p 80 and then run the above command. It will produce output in the netcat window:

GET /foo.php HTTP/1.1
Host: example.com
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.6.2 CPython/3.4.3 Windows/8
Sign up to request clarification or add additional context in comments.

3 Comments

Works only for http. If you do that with https, you'll get an error that the hostname doesn't match the certificate.
@tymoteusz-paul You can disable SSL certficate verification in Requests, that should allow you to have access to the server but open you up to man in the middle attacks: stackoverflow.com/questions/15445981/…
Looks like somebody has created a utility to allow requests to specify a host header for SSL connections: toolbelt.readthedocs.io/en/latest/…
8

Answer for HTTPS/SNI support: Use the HostHeaderSSLAdapter in the requests_toolbelt module:

The above solution works fine with virtualhosts for non-encrypted HTTP connections. For HTTPS you also need to pass SNI (Server Name Identification) in the TLS header which as some servers will present a different SSL certificate depending on what is passed in via SNI. Also, the python ssl libraries by default don't look at the Host: header to match the server connection at connection time.

The above provides a straight-forward for adding a transport adapter to requests that handles this for you.

Example

import requests

from requests_toolbelt.adapters import host_header_ssl

# Create a new requests session
s = requests.Session()

# Mount the adapter for https URLs
s.mount('https://', host_header_ssl.HostHeaderSSLAdapter())

# Send your request
s.get("https://198.51.100.50", headers={"Host": "example.org"})

2 Comments

I'm not sure what happened in the interim but as of Nov 2020, this appears to have no effect at all.
Ad @AntonOfTheWoods: This now works for me in the master branch with PR #293; in PyPI, there is now 1.0.0, which does not have it included.
8

I think the best way to send https requests to a specific IP is to add a customized resolver to bind domain name to that IP you want to hit. In this way, both SNI and host header are correctly set, and certificate verification can always succeed as web browser.

Otherwise, you will see various issue like InsecureRequestWarning, SSLCertVerificationError, and SNI is always missing in Client Hello, even if you try different combination of headers and verify arguments.

requests.get('https://1.2.3.4/foo.php', headers= {"host": "example.com", verify=True)

In addition, I tried

requests_toolbelt

pip install requests[security]

forcediphttpsadapter

all solutions mentioned here using requests with TLS doesn't give SNI support

None of them set SNI when hitting https://IP directly.

# mock /etc/hosts
# lock it in multithreading or use multiprocessing if an endpoint is bound to multiple IPs frequently
etc_hosts = {}


# decorate python built-in resolver
def custom_resolver(builtin_resolver):
    def wrapper(*args, **kwargs):
        try:
            return etc_hosts[args[:2]]
        except KeyError:
            # fall back to builtin_resolver for endpoints not in etc_hosts
            return builtin_resolver(*args, **kwargs)

    return wrapper


# monkey patching
socket.getaddrinfo = custom_resolver(socket.getaddrinfo)


def _bind_ip(domain_name, port, ip):
    '''
    resolve (domain_name,port) to a given ip
    '''
    key = (domain_name, port)
    # (family, type, proto, canonname, sockaddr)
    value = (socket.AddressFamily.AF_INET, socket.SocketKind.SOCK_STREAM, 6, '', (ip, port))
    etc_hosts[key] = [value]


_bind_ip('example.com', 443, '1.2.3.4')
# this sends requests to 1.2.3.4
response = requests.get('https://www.example.com/foo.php', verify=True)

Comments

7

You'd have to tell requests to fake the Host header, and replace the hostname in the URL with the IP address:

requests.get('http://123.45.67.89/foo.php', headers={'Host': 'www.example.com'})

The URL 'patching' can be done with the urlparse library:

parsed = urlparse.urlparse(url)
hostname = parsed.hostname
parsed = parsed._replace(netloc=ipaddress)
ip_url = parsed.geturl()

response = requests.get(ip_url, headers={'Host': hostname})

Demo against Stack Overflow:

>>> import urlparse
>>> import socket
>>> url = 'http://stackoverflow.com/help/privileges'
>>> parsed = urlparse.urlparse(url)
>>> hostname = parsed.hostname
>>> hostname
'stackoverflow.com'
>>> ipaddress = socket.gethostbyname(hostname)
>>> ipaddress
'198.252.206.16'
>>> parsed = parsed._replace(netloc=ipaddress)
>>> ip_url = parsed.geturl()
>>> ip_url
'http://198.252.206.16/help/privileges'
>>> response = requests.get(ip_url, headers={'Host': hostname})
>>> response
<Response [200]>

In this case I looked up the ip address dynamically.

5 Comments

Works great. Unfortunately there seems to be a bug that when you use POST and a data generator (for chunked encoding), you end up having two 'Host' headers, the original AND the new one. :(
@jlh: if you have a simple reproducible case I can take a look if that can be fixed.
@jlh: looks like a bug; a different low-level path through the HTTP library is taken and it is not told to skip the host header.
Does it work with https?
@NathanB: not when the certificate specifies the hostname needs to match, which is just about every certificate today. You can use the requests_toolbelt HostHeaderSSLAdapter() class to work around that.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.