0

I'm building a class for filtering data, and I've been compiling various recommendations.

This is primarily to avoid faulty data from user input in the database, and also an additional level to help prevent not yet thought of types of injection attacks, etc. (NOTE: this does NOT replace the need to also use prepared statements with any data submitted to a database.)

As much as possible, I do not want to return an error, I want to "make the data work". It's assuming someone accidentally typed a ; or ' etc. in an input field, where it can't be accepted. Or left in thousand separators (,) in a number where they shouldn't be. So just take it out and continue.

I wanted to put this out there for others to critique and use. I know there are other questions about this type of thing, but I haven't seen any with a combined recommendation for various types.

My question is - what would you do differently? Would you be concerned about users entering a number like "47387.284.02"? If so, how could I eliminate the the second dot (decimal point, period) and everything after? (While still allowing numbers like ".75" and "10.20")

// Use for numbers - integers, floats
function filterNumbers($data) {
    $data = trim(htmlentities(strip_tags($data)));
    $data = preg_replace('/[^.0-9]/', "", $data); // only numeric values allowed (and decimal point)
    $data = filter_var($data, FILTER_SANITIZE_NUMBER_FLOAT,FILTER_FLAG_ALLOW_FRACTION);
    $data = mysqli_real_escape_string($GLOBALS['con2'], $data);
    return $data;
}

// Use for short strings - alphanumeric only - usernames, varieties, etc.
function filterExtreme($data) {
    $data = trim(htmlentities(strip_tags($data)));
    $data = preg_replace('/[^ ._A-Za-z0-9]/', "", $data);
    $data = mysqli_real_escape_string($GLOBALS['con'], $data);
    return $data;
}

// Use for email addresses
function filterEmail($data) {
    $data = filter_var($data, FILTER_SANITIZE_EMAIL);
    $data = mysqli_real_escape_string($GLOBALS['con'], $data);
    return $data;
}

// Use for comments where some special characters may be desired
function filterComment($data) {
    $data = trim(htmlentities(strip_tags($data)));
    $data = filter_var($data, FILTER_SANITIZE_STRING,FILTER_FLAG_ENCODE_HIGH);
    $data = mysqli_real_escape_string($GLOBALS['con'], $data);
    return $data;
}

Note: $con is the connection details to the MySQL database.

4
  • Honestly I would not try to "make it work" with respect to numerics, because users will always come up with new ways of entering invalid data, and then you'll have to explain why your system handles some types of invalid data but not their latest try at invalid data. It shouldn't be too much to ask for someone to enter a valid number, for goodness' sake. Commented Jan 31, 2019 at 16:39
  • Also if you do use parameterized statements, you should NOT use mysqli_real_escape_string(), because you'll end up with data stored in your database including literal backslash escape characters. Commented Jan 31, 2019 at 16:40
  • Good Point @BillKarwin. If I wanted to detect if there were more than one decimal point (.) in the string in PHP and error out, any ideas? (or should I just leave that to MySQL, which won't allow it as a value?) Commented Jan 31, 2019 at 18:13
  • MySQL will allow it as a parameter value. In a numeric context, MySQL will convert a string using the leading numeric part of a string, so it will just ignore characters from the second decimal to the end. Unless you run MySQL in strict mode, then it will return an error. Commented Jan 31, 2019 at 18:27

1 Answer 1

2

You are correct to be concerned with proper data entry, but I agree with Bill Karwin's comments above (I can't make a comment due to my rep). There's no way to know the user's intent, so "making it work" might make things worse for them in the end. If they entered a ';' then it's possible they got something else wrong too.

Preventing unwanted character entry should be handled via the interface as much as possible, and your back-end code should simply test for compliance. If it fails, return an error code and let the front-end deal with it.

Stripping commas from numbers or trimming white-space probably isn't too bad, but that's also something I'd try to handle on the front whenever possible. You just have to deal with it as needed.

Sign up to request clarification or add additional context in comments.

2 Comments

I agree @Amesh, but I don't like to just handle the data verification on the front end either. I like to handle it on the front-end (HTML5/JavaScript), at the PHP level, and again with properly set SQL columns; so that bad data has a slim chance of getting into the database.
If you have full control over the front and back end, then you have more freedom to handle the data like you're describing. In my case, I restrict data input on the front end to whatever characters are allowed (via regex in JavaScript), and on the back end, I make sure it's "legal" before committing. You know your audience better than anyone. My projects are internal and proprietary, so I know my users quite well, but if you're public-facing, things could be much different and you have to do whatever makes the most sense for your project. At least you're paying attention to it!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.