0

I want to validate that the user has entered a comma separated list of words only using regex, will this work/is there a better way:

 $match = "#^([^0-9 A-z])(,\s|$))+$#";

This is not for parsing, as I will use explode for that, it is merely to validate that the user has correctly understood that values should be separated with commas.

3
  • 1
    What should the separated values be? Currently (as far as I can see) you disallow any of 0-9A-z before the first comma (for example) Commented Jul 26, 2011 at 9:37
  • Please try it, you'll see it's not right. Commented Jul 26, 2011 at 9:38
  • I was trying to disallow 0-9 but allow a-Z, because, as I said it should be a list of words only, no spaces, just a list of single words. Commented Jul 26, 2011 at 9:46

4 Answers 4

5

I don't know what the separate values should look like, but perhaps this is something that can help you:

$value = '[0-9 A-Z]+';
$match = "~^$value(,$value)*$~i";

When you know what your values should look like you can change $value.


Update

Taken this comment from the question

I was trying to disallow 0-9 but allow a-Z, because, as I said it should be a list of words only, no spaces, just a list of single words.

Just change $value

$value = '[A-Z]+';

Note, that I use the i-modifier in $match, that will include all lowercase letters too.


However, "real" csv allows a little bit more than this, for example you can put quotes around every value. I would recommend that you parse the line and then test every single value

$values = str_getcsv($line);
$valid = true;
foreach ($values as $value) {
  $valid = $valid && isValid($value);
}

str_getcsv()

Sign up to request clarification or add additional context in comments.

5 Comments

Hi, I have added to the question to state that I want to validate that the user has inputted a csv, if str_getcsv would return false if this was not the case, it would be perfect.
Maybe you want to add, whats (in your opinion) is an invalid csv. Your description ([0-9 A-z]) only covers a little subset of what is possible. Additional more important than the syntactical correctness is the semantically one, means: You should usually also ensure the number of values per line (and so on).
@KingCrunch " , " passes this test
@LAS_VEGAS: Yes, because according $value it is valid (results in array(' ',' ')). As mentioned in the question I don't really understand, when a value should be treated as "valid", thus I decided to outsource the "value"-definition, so everybody can change $value like they want. Its also generally valid for csv (independent from this regex, see comments under k102s question).
@KingCrunch This works fine with comma seperated lists without spaces, but not with spaces, also I can't have blank values
2

your $match gives an error. this

$str = 'sdfbdf,wefwef,323r,dfvdfv';
$match = "/[\S\,]+\S/";
preg_match($match,$str,$m);

can work, but why don't you use explode?

3 Comments

Accepts $str = ',323r,,,dfvdfv';';
@KingCrunch, i see. but we still don't know if this string is correct for OP. in case of csv file empty part of the string may have some meaning. anyway, i insist on using explode or str_getcsv as you said.
yeah, sorry, I "ignored", that they can be empty.
1

I think this is what you're looking for:

'#\G(?:[A-Za-z]+(?:\s*,\s*|$))+$#'

\G anchors the match either to the beginning of the string or the position where the previous match ended. That ensures that the regex doesn't skip over any invalid characters as it tries to match each word. For example, given this string:

'foo,&&bar'

It will report failure because it can't start the second match immediately after the comma.

Also, notice the character class: [A-Za-z]. You used [A-z] in your regex and [a-Z] in a comment, both of which are invalid (for this purpose, anyway). They may have been mere typos, but watch out for them nonetheless. Those typos could could end up causing subtle and/or serious bugs.

EDIT: \G isn't universally supported, so check before using it in any another regex flavors.

Comments

0

I suggest, if you want to test regular expressions, to use kiki.

About your regex: it won't work because of unmatched parentheses. Some more things to take into account:

  • You currently disallow numbers, a space and letters A-z. I think [0-9a-zA-Z ] was what you had in mind, although that still filters out letters like ö.
  • You are currently forcing a space after each comma
  • The dollar sign as an end anchor won't work unless it's the last character of the regex (excluding the #)

Besides, what's the point of validating whether or not this is a comma-seperated list?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.