0

I am looking for a way to match all the possible special characters in a string. I have a list of cities in the world and many of the names of those cities contain special characters and accented characters. So I am looking for a regular expression that will return TRUE for any kind of special characters. All the ones I found only match some, but I need one for every possible special character out there, spaces at the begin of the string included. Is this possible?

This is the one I found, but does not match all the different and possible characters I may encounter in the name of a city:

preg_match('/[#$%^&*()+=\-\[\]\';,.\/{}|":<>?~\\\\]/', $string);
4
  • 1
    Define "special". What is "special" to you? Have you thought of doing it the other way around, defining a list of characters you deem "non-special" and checking if anything except those are in the string? Commented Sep 17, 2013 at 13:25
  • 1
    How about everything that is not "A-Za-z"? Commented Sep 17, 2013 at 13:26
  • How about using just \W? Commented Sep 17, 2013 at 13:27
  • its better to define what should not be matched, because there are many special characters plus accented Commented Sep 17, 2013 at 13:27

4 Answers 4

1

You're going to need the UTF8 mode "#pattern#u": https://www.php.net/manual/en/reference.pcre.pattern.modifiers.php

Then you can use the Unicode escape sequences: https://www.php.net/manual/en/regexp.reference.unicode.php

So that preg_match("#\p{L}*#u", "København", $match) will match.

Sign up to request clarification or add additional context in comments.

Comments

0

Use unicode properties:

\pL stands for any letter

To match a city names, i'd do (I suppose - and space are valid characters) :

preg_match('/\s*[\pL-\s]/u', $string);

Comments

0

You can just reverse your pattern... to match everything what is not "a-Z09-_" you would use

preg_match('/[^-_a-z0-9.]/iu', $string);

The ^ in the character class reverses it.

Comments

0

I had the same problem where I wanted to split nameparts which also contained special characters:

For example if you want to split a bunch of names containing:

<lastname>,<forename(s)> <initial(s)> <suffix(es)>

fornames and suffix are separated with (white)space(s)
initials are separated with a . and with maximum of 6 initials

you could use

$nameparts=preg_split("/(\w*),((?:\w+[\s\-]*)*)((?:\w\.){1,6})(?:\s*)(.*)/u",$displayname,null,PREG_SPLIT_DELIM_CAPTURE);
//first and last part are always empty
array_splice($naamdelen, 5, 1);
array_splice($naamdelen, 0, 1);
print_r($nameparts);

Input:
Powers,Björn B.A. van der
Output:
Array ( [0] => Powers[1] => Björn [2] => B.A. [3] => van der)

Tip: the regular expression looks like from outer space but regex101.com to the rescue!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.