0

Hey guys I've seen a lot of options on fread (which requires a fiole, or writing to memory), but I am trying to invalidate an input based on a string that has already been accepted (unknown format). I have something like this

        if (FALSE !== str_getcsv($this->_contents, "\n"))
        {
            foreach (preg_split("/\n/", $this->_contents) AS $line)
            {
                $data[] = explode(',', $line);
            }

            print_r($data); die;
            $this->_format = 'csv';
            $this->_contents = $this->trimContents($data);

            return true;
        }

Which works fine on a real csv or csv filled variable, but when I try to pass it garbage to invalidate, something like: https://www.gravatar.com/avatar/625a713bbbbdac8bea64bb8c2a9be0a4 which is garbage (since its a png), it believes its csv anyway and keeps on chugging along until the program chokes. How can I fix this? I have not seen and CSV validators that are not at least several classes deep, is there a simple three or four line to (in)validate?

4
  • 1
    First, define what "valid" means. Then, write code to look for it. Commented Nov 1, 2013 at 22:48
  • 1
    You want to validate or invalidate? The former means to check whether something is valid, the latter means to make something invalid. Commented Nov 1, 2013 at 22:49
  • is there a simple three or four line to (in)validate? nope. CSV is so loosely structured (and it has no telltale signs like header bytes) that there technically is no way to tell whether a file is CSV or not. Commented Nov 1, 2013 at 22:50
  • Do you know what structure the csv should have? Ie how many volume per line? You could parse the first line and see how many columns it creates. Commented Nov 1, 2013 at 22:52

2 Answers 2

1

is there a simple three or four line to (in)validate?

Nope. CSV is so loosely defined - it has no telltale signs like header bytes, and there isn't even a standard for what character is used for separating columns! - that there technically is no way to tell whether a file is CSV or not - even your PNG could technically be a gigantic one-column CSV with some esoteric field and line separator.

For validation, look at what purpose you are using the CSV files for and what input you are expecting. Are the files going to contain address data, separated into, say, 10 columns? Then look at the first line of the file, and see whether enough columns exist, and whether they contain alphanumeric data. Are you looking for a CSV file full of numbers? Then parse the first line, and look for the kinds of values you need. And so on...

Sign up to request clarification or add additional context in comments.

Comments

1

If you have an idea of the kinds of CSVs likely to make it to your system, you could apply some heuristics -- at the risk of not accepting valid CSVs. For instance, you could look at line length, consistency of line length, special characters, etc...

If all you are doing is checking for the presence of commas and newlines, then any sufficiently large, random file will likely have those and thus pass such a CSV test.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.