2

I have regex for detecting Cyrillic First, Middle and Last names.

([А-Я][а-я]+\s+[А-Я][а-я]+[.|\s|][А-Я][а-я]+[.|\s|])

Using:

preg_match_all('/([А-Я][а-я]+(\\s|.|[ ])[А-Я][а-я]+(\\s|.|[ ])[А-Я][а-я]+)/','it\'s a test string with a name like Васильців Василь Васильович and Петро Петрович Петренко смисми ВВ Аммм Мммм Аааааа',$ar);

The results:

Array
(
    [0] => Array
        (
            [0] => �асил�
            [1] => �асил�
            [2] => �асильови�
            [3] => енко
            [4] => мисми
            [5] => �ааааа
        )

    [1] => Array
        (
            [0] => �асил�
            [1] => �асил�
            [2] => �асильови�
            [3] => енко
            [4] => мисми
            [5] => �ааааа
        )

    [2] => Array
        (
            [0] => �
            [1] => �
            [2] => �
            [3] => �
            [4] => �
            [5] => �
        )

    [3] => Array
        (
            [0] => �
            [1] => �
            [2] => �
            [3] => �
            [4] => �
            [5] => �
        )

)

It's working fine at https://regex101.com/r/xA6vX0/1 but does not work in PHP (it's detecting wrong text parts). Can you explain what's wrong or prompt me to a better online service?

2
  • Check if adding u helps: '/([А-Я][а-я]+(\\s|.|[ ])[А-Я][а-я]+(\\s|.|[ ])[А-Я][а-я]+)/u' Commented Apr 23, 2015 at 21:44
  • Have you tried with unicode escaping \uXXXX? regular-expressions.info/unicode.html Commented Apr 23, 2015 at 21:46

1 Answer 1

1

I have just tested on PHP v.5.5.18 - u option works well:

preg_match_all('/([А-ЯЁ][ёа-я]+(?:[\\s.][ЁА-Я][ёа-я]+){2})/u','it\'s a test string with a name like Васильців Василь Васильович and Петро Петрович Петренко смисми ВВ Аммм Мммм Аааааа',$ar);
print_r($ar);

Also, I contracted the spaces part with a period that was unescaped, and the pattern itself.

Output:

Array                                                                                                                                                                
(                                                                                                                                                                    
    [0] => Array                                                                                                                                                     
        (                                                                                                                                                            
            [0] => Петро Петрович Петренко                                                                                                                           
            [1] => Аммм Мммм Аааааа                                                                                                                                  
        )                                                                                                                                                            

    [1] => Array                                                                                                                                                     
        (                                                                                                                                                            
            [0] => Петро Петрович Петренко                                                                                                                           
            [1] => Аммм Мммм Аааааа                                                                                                                                  
        )                                                                                                                                                            

)      
Sign up to request clarification or add additional context in comments.

1 Comment

There is one more catch: if you want to also capture the letter ё, you will need to add it as a separate character to the range. Sorry, forgot about it in the beginning, adding it now. (Also, please consider upvoting if the answer proved useful to you).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.