how to verify if the url exist not redirecting?

Question

how to verify if the page url exist and not redirect to not found url page
example :

import socket
try:
    socket.gethostbyname('www.google.com/imghp')
except socket.gaierror as ex:
    print "Not existe"

it retourn alwayse Not existe

What are you trying to accomplish? socket.gethostbyname doesn't take a URL. Probably you want to make an HTTP request, which is a totally different API. — John Zwinck
– John Zwinck, Commented Mar 8, 2015 at 12:32
gethostbyname() can be uses with a host name not with a (incomplete) URL. Try gethostbyname('www.google.com') — Klaus D.
– Klaus D., Commented Mar 8, 2015 at 12:33

Community · Accepted Answer · 2020-06-20 09:12:55Z

4

you're using the wrong tool for the task!

screw hammer

From the manual:

socket.gethostbyname(hostname)

Translate a host name to IPv4 address format. The IPv4 address is returned as a string, such as '100.50.200.5'. If the host name is an IPv4 address itself it is returned unchanged. See gethostbyname_ex() for a more complete interface. gethostbyname() does not support IPv6 name resolution, and getaddrinfo() should be used instead for IPv4/v6 dual stack support.

That tool is to check if a domain exists, and get its IP address:

>>> try:
...     print(socket.gethostbyname('www.google.com'))
... except socket.gaierror as ex:
...     print("Does not exists")
... 
216.58.211.132

what you may want is to actually connect to the site and check if there's a page:

>>> import requests
>>> response = requests.head('http://www.google.com/imghp')
>>> if response.status_code == 404:
...    print("Does not exists")
... else:
...    print("Exists")
...
Exists

The .head() method from python-requests only gets the information about the page from the webserver, but not the page itself, so it's very lightweight in terms of network usage.

spoiler alert: if you try to get the contents of the page, using response.content, you'll get nothing, for that you need to use the .get() method.

update #1

the site you're checking against is broken, i.e. it does not follow internet standards. Instead of giving a 404, it's giving a 302 to redirect to the "page does not exists" page with a status code of 200:

>>> response = requests.head('http://qamarsoft.com/does_not_exists', allow_redirects=True)
>>> response.status_code
200

To sort that out, you need to get the page of that site, and check that the redirected URI has 404 in the redirection URL:

>>> response = requests.head('http://qamarsoft.com/does_not_exists'
>>> response.headers['location']
'http://qamarsoft.com/404'

So the test would become:

>>> response = requests.head('http://qamarsoft.com/does_not_exists')
>>> if '404' in response.headers['location']:
...     print('Does not exists')
... else:
...     print('Exists')
Exists

update #2

for the second URL, you can try it out yourself in the python console:

>>> import requests
>>> response = requests.head('http://www.***********.ma/does_not_Exists')
>>> if response.status_code == 404:
...    print("Does not exists")
... else:
...    print("Exists")
...
Does not exists
>>> response = requests.head('http://www.***********.ma/annonceur/a/3550/n.php ')
>>> if response.status_code == 404:
...    print("Does not exists")
... else:
...    print("Exists")
...
Exists

Nota Bene

you might want to install the requests package:

pip install requests

or if you're modern and use python3:

pip3 install requests

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Mar 8, 2015 at 12:34

zmo

24.9k4 gold badges58 silver badges91 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

m3asmi Over a year ago

try using your code with the page [link]qamarsoft.com/HttpUrlDontExist[/link] will return Exists !!!

zmo Over a year ago

because it outputs a 302 code to get to a 404 page, which in turn gives a 200 status code. The given site is broken, not my code! :-)

m3asmi Over a year ago

thanks brother using you're first post with a test on 200 response

zmo Over a year ago

Well, don't miss all the other status errors, like 500, 503, 403 etc… As for the second site you gave, you wanted to the existence of a page on a site that does not respects Internet standards. All in all, you might have a valid page that does a 302 Redirect code, followed by a 200 OK code, while the other one you gave gives the same but truly is a 404.

m3asmi Over a year ago

can you just delete the second url of test plz

|

Kostas Drk · Accepted Answer · 2015-03-08 13:01:24Z

0

It's true that with gethostbyname() you will not get what you want done. Consider using urllib2. In your case the following could do what you want:

import urllib2

#The data variable can be used to send POST data
data=None
#Here add as many header fields as you wish
headers={"User-agent":"Blahblah", "Cookie":"yourcookievalues"}
url = "http://www.google.com/imghp"
request = urllib2.Request(url, data, headers)
try:
    response = urllib2.urlopen(request)
    #Check redirection here
    if (response.geturl() != url):
         print "The page at: "+url+" redirected to: "+response.geturl()
except urllib2.HTTPError as err:
    #Catch 404s etc.
    print "Failed with code: "+str(err)

Hope this helps you out!

edited Mar 8, 2015 at 13:01

answered Mar 8, 2015 at 12:41

Kostas Drk

3352 silver badges7 bronze badges

4 Comments

m3asmi Over a year ago

File "<stdin>", line 2, in <module> NameError: name 'request' is not defined

zmo Over a year ago

anyway, python-requests does the same thing, but in less lines!

Kostas Drk Over a year ago

@zmo well yeah, but if you take out my comments, blank lines and the variable declaration and just pass them directly to the function, like you did, then it's pretty much the same.

zmo Over a year ago

your code is equivalent to: print("Exists" if requests.head(URL, allow_redirects=True).status_code != 404 else "Not Exists"), no need for the try block and the redirect check. There's a good reason why python-requests rocks :-)

Collectives™ on Stack Overflow

how to verify if the url exist not redirecting?

2 Answers 2

you're using the wrong tool for the task!

update #1

update #2

Nota Bene

8 Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

you're using the wrong tool for the task!

update #1

update #2

Nota Bene

8 Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related