1

There are \p{Script=Latin} (also can be written as \p{sc=Latin}) and \p{Uppercase}.

But there is currently no way to select an intersection of multiple sets like /^([ \p{Script=Latin} & \p{Uppercase} ])/ in Perl ≥5.18 or \p{Script=Latin,Uppercase}.

So the task is to find a workaround.

Example input:

const input = [
'License: GPL!',
'License: WÐFPL!',
'License: None!',
]

Example output: ['GPL', 'WÐFPL']

The answer could use use a regexp that looks like this for example: /^License:\s*(?<abbr>\p{Script=Latin,Uppercase}+)!$/u

2 Answers 2

2

There's no ideal workaround to do that except if you want the intersection of predefined character classes. All you have to do is to use a negation and negated character classes:

/^License:\s*([^\P{Script=Latin}\P{Uppercase}]+)!/u

demo

It is simple set logic:

A ∩ B = ∁∁(A ∩ B)    // complement of a complement is an involution
A ∩ B = ∁(∁A ∪ ∁B)   // Morgan's law: complement of an intersection is the union
                     // of complements
Sign up to request clarification or add additional context in comments.

Comments

0
const input = [
'License: GPL!',
'License: WÐFPL!',
'License: None!',
]
const regexp = /^License:\s*(?<abbr>(?:(?![ƗØ])(?=\p{Uppercase})\p{sc=Latin})+)!$/u
console.log(input.map(str => str.match(regexp)?.groups?.abbr).filter(Boolean))

Explanation:

^
License:
\s*
(?<abbr>   // named capture groups
    (?:
        // A negative look-ahead assertion.
        // Exclusion of Ɨ and Ø was not required by the question;
        // this line is here to provide more examples.
        (?![ƗØ])

        // A look-ahead assertion (looks into the future,
        // and then always goes back to the former position)
        (?=\p{Uppercase})

        \p{sc=Latin}
    )+
)
!
$

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.