I am trying to adapt a php application to handle non-latin scripts (specifically: Japanese, simplified Chinese and Arabic). The app's data validation routines make frequent use of regular expressions to check input, but I am not sure how to adapt the \w character type to other languages without installing additional locales on the system (which I cannot rely on).
Previous developers to have worked on the app have simply added needed characters to the regexes as the number of languages we supported grew (you frequently see "[\wÀÁÂÃÄÅÆÇÈÉ... etc" in the code), but I can't really do this for all the alphabets I need to support now.
Does anybody out there have some advice on how to tackle this?
ctype_alnum, but what you're asking for is "what is an alphanumeric character in any locale"...