3

I'm playing with the following code in Swift to build an appropriate regex for an application:

let regExp = "-(\\([0-9.a-z()+-×÷√^₁₀²³/]+\\)|[0-9.a-z()+-×÷√^₁₀²³/]+)"

let testString = "-(hsjshdf)   -hsghsgsgs -(k) -(1/64) -dhsg62 -(p)"

let regularExpression = try! NSRegularExpression(pattern: regExp, options: [])

let matchesArray = regularExpression.matches(in: testString, options: [], range: NSRange(location: 0, length: testString.characters.count))

for match in matchesArray {
    for i in 0..<match.numberOfRanges {
        let range = match.rangeAt(i)
        let r = testString.index(testString.startIndex, offsetBy: range.location) ..< testString.index(testString.startIndex, offsetBy: range.location + range.length)
        print(testString.substring(with: r))
    }
}

The result I get is as follows:

-(hsjshdf)
(hsjshdf)
-hsghsgsgs
hsghsgsgs
-(k)
(k)
-(1/64)
(1/64)
-dhsg62
dhsg62
-(p)
(p)

However, I want the regexp match and group the substring within "()", so I can get the following output:

-(hsjshdf)
(hsjshdf)
hsjshdf
-hsghsgsgs
hsghsgsgs
-(k)
(k)
k
-(1/64)
(1/64)
1/64
-dhsg62
dhsg62
-(p)
(p)
p

I tried the following modification to the original regex, and it worked for the substring "-(hsjshdf)" but crashed when printing the matches of the substring "-hsghsgsgs" with an execution time error (fatal error: cannot increment beyond endIndex):

let regExp = "-(\\(([0-9.a-z()+-×÷√^₁₀²³/]+)\\)|[0-9.a-z()+-×÷√^₁₀²³/]+)"

I'm not familiar with NSRegularExpression. Am I using the wrong regexp? Do I need to set an special option?

Thanks for your help. Kindest regards.

/TB

2
  • Please show your code which generates The result as follows. The code you have shown does not generate such result. Commented Apr 12, 2017 at 10:21
  • Done. Sorry for the inconvenience, I forgot to include the loops that traverse through the ranges and subranges since I assumed the problem is not in those loops. Commented Apr 12, 2017 at 12:32

1 Answer 1

1

In fact, the problem resides in the loops.

You know that you have two pairs of capturing parentheses in your regex let regExp = "-(\\(([0-9.a-z()+-×÷√^₁₀²³/]+)\\)|[0-9.a-z()+-×÷√^₁₀²³/]+)", and the latter (the inner) one may not capture any parts of the string.

One thing you should know is that NSRegularExpression returns NSRange(location: NSNotFound, length: 0) for missing captures. In the current implementation, NSNotFound has the same value as Int.max which may be far greater than any actual Strings.

You just need to check if location of the ranges is NSNotFound or not, before using them:

let regExp = "-(\\(([0-9.a-z()+-×÷√^₁₀²³/]+)\\)|[0-9.a-z()+-×÷√^₁₀²³/]+)"

let testString = "-(hsjshdf)   -hsghsgsgs -(k) -(1/64) -dhsg62 -(p)"

let regularExpression = try! NSRegularExpression(pattern: regExp, options: [])

//###(1) Use `.utf16.count`, not `.characters.count`.
let matchesArray = regularExpression.matches(in: testString, options: [], range: NSRange(location: 0, length: testString.utf16.count))

for match in matchesArray {
    for i in 0..<match.numberOfRanges {
        let range = match.rangeAt(i)
        if range.location == NSNotFound {continue} //###(2) Skip missing captures.
        //###(3) Your way of creating `r` does not work for non-BMP characters.
        print((testString as NSString).substring(with: range))
    }
}

(My comments (1) and (3) are not critical for your input testString, but you should also know that NSRegularExpression works with NSStrings which are represented in UTF-16 based format internally. The location and length represent the UTF-16 based offset and count, not Characters based.)

Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much OOPer, your corrections and comments were illustrative. I still don't get why matching substrings like "-hsghsgsgs" with the option [0-9.a-z()+-×÷√^₁₀²³/]+ of the regexp results in the NSNotFound situation, given that such option doesn't contain inner capturing parenthesis as the other option does. Could you please explain or provide a link to a text?
You'd better check how your pattern "-(\\(([0-9.a-z()+-×÷√^₁₀²³/]+)\\)|[0-9.a-z()+-×÷√^₁₀²³/]+)" matches the substring like "-hsghsgsgs". You have two subpatterns after - inside the outer capture, \\(([0-9.a-z()+-×÷√^₁₀²³/]+)\\) and [0-9.a-z()+-×÷√^₁₀²³/]+. The string hsghsgsgs does not enclosed in parentheses, so the latter matches, the first does not match. The range corresponding to the capture which is included in the non-matching subpattern returns NSNotFound.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.