3

It's to validate username, my codes:

import re
regex = r'^[\w.@+-]+\Z'
result = re.match(regex,'名字')

In python2.7, it returns None.

In python3.7, it returns '名字'.

0

1 Answer 1

4

It's because of the different definitions for \w in Python 2.7 versus Python 3.7.

In Python 2.7, we have:

When the LOCALE and UNICODE flags are not specified, matches any alphanumeric character and the underscore; this is equivalent to the set [a-zA-Z0-9_].

(emphasis and hyperlink and formatting added)

However, in Python 3.7, we have:

For Unicode (str) patterns: Matches Unicode word characters; this includes most characters that can be part of a word in any language, as well as numbers and the underscore. If the ASCII flag is used, only [a-zA-Z0-9_] is matched.

(emphasis and formatting added)

So, if you want it to work in both versions, you can do something like this:

# -*- coding: utf-8 -*-
import re
regex = re.compile(r'^[\w.@+-]+\Z', re.UNICODE)
match = regex.match(u'名字')

if match:
    print(match.group(0))
else:
    print("not matched!")

output:
名字

Here's proof that it works in both versions:

works

Note the differences:

  • I added # -*- coding: utf-8 -*- at the top of the script, because without it, in Python 2.7, we'll get an error saying

    Non-ASCII character '\xe5' on line 3, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details

  • Instead of using result = re.match(pattern, string), I used regex = re.compile(pattern, flags) and match = regex.match(string) so that I can specify flags.

  • I used re.UNICODE flag, because without it, in Python 2.7, it will only match [a-zA-Z0-9_] when using \w.

  • I used u'名字' instead of '名字', because in Python 2.7 you need to use Unicode Literals for unicode characters.

Also, while answering your question, I found out that print("not matched!") works in Python 2.7 as well, which makes sense, because in this case the parentheses are ignored, which I didn't know, so that was fun.

Sign up to request clarification or add additional context in comments.

2 Comments

Another cross-Py2Py3-compatible print trick: to print just a blank line, replace print with print('').
@PaulMcG Right on

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.