3

I've made a program that opens files and searches for a word

I want it to only work on TEXT Files

Is there a way provided by C to check if a file is BINARY, and if so, I want to exit the program before any operations take place

Thanks

9
  • Depends what your defn of binary is. Check every byte in the file isalphanum() or just if any are > 127 ? Commented Oct 20, 2017 at 13:34
  • 1
    All files are binary. MS has the distinction due to line endings Commented Oct 20, 2017 at 13:34
  • 1
    @EdHeal Space is printable. Commented Oct 20, 2017 at 13:37
  • 1
    @EdHeal I meant according to isprint(). Commented Oct 20, 2017 at 13:40
  • 2
    'printf("is binary\n");' done. Commented Oct 20, 2017 at 13:47

4 Answers 4

4

No, there isn't, because it's impossible to tell for sure. If you expect a specific encoding, you can check yourself whether the file contents are valid in this encoding, e.g. if you expect ASCII, all bytes must be <= 0x7f. If you expect UTF-8, it's a bit more complicated, see a description of it.

In any case, there's no guarantee that a "binary" file would not by accident look like a valid file in any given text encoding. In fact, the term "binary file" doesn't make too much sense, as all files contain binary data.

Sign up to request clarification or add additional context in comments.

1 Comment

Right now this seems to work, I don't expect to be working on special symbols. I'm trying to produce a program that works like GREP on linux. Thanks!
3

If we assume that by text you mean ASCII and not UTF-8, you can do this by reading each character and using isascii() and isspace() to check if it is a valid character:

void is_text(char *filename) {
    FILE *f = fopen(filename, "r");
    if (!f) {
        perror("fopen failed");
        return;
    }
    int c;
    while ((c=fgetc(f) != EOF) {
        if ((!isascii(c) || iscntrl(c)) && !isspace(c)) {
            printf("is binary\n");
            fclose(f);
            return;
        }
    }
    printf("is text\n");
    fclose(f);
}

If the file contains UTF-8 characters, it becomes more complicated as you have to look at multiple bytes at once and see if they are valid UTF-8 byte sequences. There's also the question of which Unicode code points are considered text.

2 Comments

I am trying to emulate the functioning of GREP in Linux, I don't think it works on unicode characters so I don't think it will be a problem. Thanks for your help!
There is typo in the answer: instead of c=fgetc(c) it must be c=fgetc(f).
1

It's not the file per se which is binary or text; it is just about how you interpret the content of the file when opening it. You may interpret a file containing solely text as binary, thereby avoiding that a /r/n might get translated to a /n only; And you may open a file containing raw data like, for example, a bitmap using a text mode, thereby probably corrupting the content in that a 0x0D 0x0A gets converted to a 0x0D only.

So you cannot check the file per se, but you may open the file in binary mode and see if the content contains anything which you do not interpret as text.

Comments

0

perhaps: system(file "path/filename");

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.