0

I want to have a script that gets a file name and checks if it's a file. A file ends with .txt, .exe etc'. There is any library or module in python that include ALL the file formats? If there isn't, how can I verify that the given input (like: hey.txt, what.exe etc') is a file? P.S I'm checking files of a website, not an operation system file (like: "https://www.magshimim.net/App_Themes/En/images/powered_by_priza_heb.gif" Thanks to all the helpers :)

5
  • 1
    "A file ends with .txt, .exe etc". That doesn't sound right to me. I see files with names like README all the time. They have no extension at all, but are still files. Commented Aug 18, 2015 at 16:14
  • 1
    os.path.isfile(input)? Or am I not understanding you correctly? Commented Aug 18, 2015 at 16:15
  • 1
    You can't know all the file formats, You should either try to find the file in a system or if you try to get the file extension, match the characters after the last dot Commented Aug 18, 2015 at 16:15
  • look at my edit please Commented Aug 18, 2015 at 16:26
  • You don't need to check extensions against a library of all known extensions, you're just trying to determine whether something is a file. That's called an XY Problem. Commented Aug 18, 2015 at 17:14

4 Answers 4

2

There is no such library because there is an unlimited number of file formats. I can create my own .something, and you can too, the file will still be a proper file.

Instead, you have to use os.path.isfile().


As @zero323 pointed it out, and according to your edit, you should use the library mimetypes.

Then, use .guess_type() which returns None if the filetype can not be guessed.

See the full list of MIME types here.

Sign up to request clarification or add additional context in comments.

1 Comment

Well, I am no saying it is useful here but for common types there is always mimetypes :)
2

If the files are located on web server, you can use Content-Type header to get type of the file.

import urllib2

urls = ['https://www.magshimim.net/App_Themes/En/images/powered_by_priza_heb.gif',
        'https://www.magshimim.net/images/magshimim_logo.png']

for url in urls:
    response = urllib2.urlopen(url)
    print url
    print response.headers.getheader('Content-type')    # Content Type
    print response.headers.getheader('Content-Length')  # Size
    print

Output should be :

https://www.magshimim.net/App_Themes/En/images/powered_by_priza_heb.gif
image/gif
1325

https://www.magshimim.net/images/magshimim_logo.png
image/png
8314

1 Comment

Wow, I just remembered that OP mentioned web server and returned to add the same to my post. And, obviously, saw that you already posted this solution! Excellent!
0

the best thing would be to use a regular expressions,since your script is checking whether the following object is a file or not.....if you want to check whether the particular file exists then it would be beneficial to use os.path.isfile(path)... if you are comfortable with regular expressions then try to create a regular expression,otherwise let me know i will create it for you. your feedback will be highly appreciated thank you.

Comments

0

I suggest:

import os.path # Use any path (ntpath, posixpath, ...) module that uses "." as an extension separator instead to be sure (if you want)

filename, ext = os.path.splitext(inputname)
# If filename and ext are both full, then it is a filename like 'something.txt'
# If only ext is there, and filename is not, then filename is something like '.bashrc' or '.ds_store'
# If there is no ext, only filename, then a file doesn't have an extension
# So:
if filename and ext: print "File", filename, "with extension", ext
elif ext and not filename:
    filename = ext; ext = ""
    print "File", filename, "with no extension!"
else: print filename, "is not a file by 'must have an extension' rule!"

You can also achieve the check with something like:

c = inputname.count(".")
if c!=0 and not inputname.endswith(".") and not (inputname.startswith(".") and c==1):
    print inputname, "is a file because it has an extension!"
else: print inputname, "is not a file, no extension!"

If you really have to check for existing format, then, yes, use mimetypes.

Or Google around, I saw somewhere pretty extensive list (as library) of all formats for PHP. Take this and convert it to Python. Few find and replaces would do it.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.