how to verify if the page url exist and not redirect to not found url page
example :
import socket
try:
socket.gethostbyname('www.google.com/imghp')
except socket.gaierror as ex:
print "Not existe"
it retourn alwayse Not existe

From the manual:
socket.gethostbyname(hostname)
Translate a host name to IPv4 address format. The IPv4 address is returned as a string, such as '100.50.200.5'. If the host name is an IPv4 address itself it is returned unchanged. See gethostbyname_ex() for a more complete interface. gethostbyname() does not support IPv6 name resolution, and getaddrinfo() should be used instead for IPv4/v6 dual stack support.
That tool is to check if a domain exists, and get its IP address:
>>> try:
... print(socket.gethostbyname('www.google.com'))
... except socket.gaierror as ex:
... print("Does not exists")
...
216.58.211.132
what you may want is to actually connect to the site and check if there's a page:
>>> import requests
>>> response = requests.head('http://www.google.com/imghp')
>>> if response.status_code == 404:
... print("Does not exists")
... else:
... print("Exists")
...
Exists
The .head() method from python-requests only gets the information about the page from the webserver, but not the page itself, so it's very lightweight in terms of network usage.
spoiler alert: if you try to get the contents of the page, using response.content, you'll get nothing, for that you need to use the .get() method.
the site you're checking against is broken, i.e. it does not follow internet standards. Instead of giving a 404, it's giving a 302 to redirect to the "page does not exists" page with a status code of 200:
>>> response = requests.head('http://qamarsoft.com/does_not_exists', allow_redirects=True)
>>> response.status_code
200
To sort that out, you need to get the page of that site, and check that the redirected URI has 404 in the redirection URL:
>>> response = requests.head('http://qamarsoft.com/does_not_exists'
>>> response.headers['location']
'http://qamarsoft.com/404'
So the test would become:
>>> response = requests.head('http://qamarsoft.com/does_not_exists')
>>> if '404' in response.headers['location']:
... print('Does not exists')
... else:
... print('Exists')
Exists
for the second URL, you can try it out yourself in the python console:
>>> import requests
>>> response = requests.head('http://www.***********.ma/does_not_Exists')
>>> if response.status_code == 404:
... print("Does not exists")
... else:
... print("Exists")
...
Does not exists
>>> response = requests.head('http://www.***********.ma/annonceur/a/3550/n.php ')
>>> if response.status_code == 404:
... print("Does not exists")
... else:
... print("Exists")
...
Exists
you might want to install the requests package:
pip install requests
or if you're modern and use python3:
pip3 install requests
302 code to get to a 404 page, which in turn gives a 200 status code. The given site is broken, not my code! :-)500, 503, 403 etc… As for the second site you gave, you wanted to the existence of a page on a site that does not respects Internet standards. All in all, you might have a valid page that does a 302 Redirect code, followed by a 200 OK code, while the other one you gave gives the same but truly is a 404.It's true that with gethostbyname() you will not get what you want done. Consider using urllib2. In your case the following could do what you want:
import urllib2
#The data variable can be used to send POST data
data=None
#Here add as many header fields as you wish
headers={"User-agent":"Blahblah", "Cookie":"yourcookievalues"}
url = "http://www.google.com/imghp"
request = urllib2.Request(url, data, headers)
try:
response = urllib2.urlopen(request)
#Check redirection here
if (response.geturl() != url):
print "The page at: "+url+" redirected to: "+response.geturl()
except urllib2.HTTPError as err:
#Catch 404s etc.
print "Failed with code: "+str(err)
Hope this helps you out!
print("Exists" if requests.head(URL, allow_redirects=True).status_code != 404 else "Not Exists"), no need for the try block and the redirect check. There's a good reason why python-requests rocks :-)
socket.gethostbynamedoesn't take a URL. Probably you want to make an HTTP request, which is a totally different API.gethostbyname()can be uses with a host name not with a (incomplete) URL. Trygethostbyname('www.google.com')