3

This is the module I'm asking about: https://pypi.org/project/regex/, it's Matthew Barnett's regex.

In the project description page, the difference in behavior between V0 and V1 are stated as (note what's in bold):

Old vs new behaviour

In order to be compatible with the re module, this module has 2 behaviours:

  • Version 0 behaviour (old behaviour, compatible with the re module):

    Please note that the re module’s behaviour may change over time, and I’ll endeavour to match that behaviour in version 0.

    • Indicated by the VERSION0 or V0 flag, or (?V0) in the pattern.
    • Case-insensitive matches in Unicode use simple case-folding by default.
  • Version 1 behaviour (new behaviour, possibly different from the re module):

    • Indicated by the VERSION1 or V1 flag, or (?V1) in the pattern.
    • Case-insensitive matches in Unicode use full case-folding by default.

If no version is specified, the regex module will default to regex.DEFAULT_VERSION.

I tried a few examples myself but didn't figure out what it does:

Python 3.6.7 (default, Oct 22 2018, 11:32:17)
[GCC 8.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import regex
>>> r = regex.compile("(?V0i)и")
>>> r
regex.Regex('(?V0i)и', flags=regex.I | regex.V0)
>>> r.search("И")
<regex.Match object; span=(0, 1), match='И'>
>>> regex.search("(?V0i)é", "É")
<regex.Match object; span=(0, 1), match='É'>
>>> regex.search("(?V0i)é", "E")
>>> regex.search("(?V1i)é", "E")

What is the difference between simple case-folding and full case-folding? Or can you provide an example where a (case insensitive) regex matches something in V1 but not in V0?

2
  • 1
    Not tested but it probably follows this table. Full case folding may replace a few special characters by two characters, simple casefolding doesn't. Such characters are e.g. capital and small latin sharp s. Commented Feb 9, 2019 at 6:15
  • @MichaelButscher Great, it works. You can get a green tick if you write it as an answer. Commented Feb 9, 2019 at 6:19

1 Answer 1

1

It follows the Unicode case folding table. Excerpt:

# The entries in this file are in the following machine-readable format:
#
# <code>; <status>; <mapping>; # <name>
#
# The status field is:
# C: common case folding, common mappings shared by both simple and full mappings.
# F: full case folding, mappings that cause strings to grow in length. Multiple characters are separated by spaces.
# S: simple case folding, mappings to single characters where different from F.

[...]

# Usage:
#  A. To do a simple case folding, use the mappings with status C + S.
#  B. To do a full case folding, use the mappings with status C + F.

The folding is only different for a few special characters, examples are small and capital latin sharp s:

00DF; F; 0073 0073; # LATIN SMALL LETTER SHARP S

[...]

1E9E; F; 0073 0073; # LATIN CAPITAL LETTER SHARP S
1E9E; S; 00DF; # LATIN CAPITAL LETTER SHARP S
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.