(*PRUNE), (*SKIP), (*COMMIT) and (*THEN) works all the same in the sense:
- they are triggered if the subpattern after them fails.
- they are ignored when the subpattern after succeeds
- they forbid the backtracking mechanism in the subpattern before them. (but not in the subpattern after them)
The difference is only what happens after.
With (*PRUNE) the pattern is tested again but at the next position in the string (if the whole pattern was tested at the position n, then the next try is tested at the position n+1 in the subject string). Note that it isn't different from the normal behavior when a pattern fails, except that it avoids the backtracking steps in the subpattern before it. In other words, this backtracking control verb is useful to fail faster.
Consider the two patterns (in free-spacing mode) with the subject string aaa bbb:
With the first one, the first branch is tried and obviously since there is no letter c in the string, it fails (at the end of the string). But the second branch isn't tested immediately, the first branch has to test all possibilities via the backtracking mechanism (that doesn't change the result here) to be sure there's no letter c before the end of the string. Only after that, the second branch is tried (and fails too). Then the whole pattern is tried at the next position in the subject string and this game continues until the pattern is tested at the position of the first letter b. Then the second branch succeeds. Tedious isn't it?
With the second pattern, same scenario, except that all the backtracking steps in the first branch are avoided (when this one fails, the pattern is immediately tested at the next position in the subject string) and the second branch is tested only when the letter a isn't found (the first branch fails but this time before the verb and this allows the second branch to be tested).
With (*SKIP) the pattern is tested again but after the position reached by the subpattern before the verb. Useful when you want to skip useless positions (or problematic positions) in the string and to advance faster.
(*SKIP:name) does the same thing, except that the next try starts at the position of the marker "name" (*MARKER:name) instead of the position of (*SKIP:name). The marker has to be already known for the regex engine and has to stand before the (*SKIP:name) verb in the pattern.
Note that (*SKIP:name) is the only one verb that has a particular relation with a marker. All other verbs with :name are only dummy shortcuts for (*MARKER:name)(*VERB:name) more or less useful.
With (*COMMIT) the pattern isn't tested again at all.
Consider these two patterns (in free-spacing mode) with the subject string aaa bbb:
With the first pattern, the result is the whole string aaa bbb.
The first branch fails, the second branch is tested and succeeds.
With the second pattern, there's no match at all because (*COMMIT) is encountered in the first branch that fails (after the verb), so the research is aborted definitively. The second branch is never tested.
(*THEN) is a little different in the sense that it is useful only in an alternation.
Consider these two patterns (in free-spacing mode) with the subject string aaa bbb:
With the first pattern, the result is aaa b.
What happens: the first branch .* \b b is tried: .* reach the end of the string because the quantifier is greedy (it covers aa bbb), the word boundary \b succeeds (between the last b and then end of the string), but there's no more b (only the end of the string).
The backtracking mechanism starts and .* will give back characters one by one until the subpattern \b b succeeds:
- First backtracking step:
.* gives back the last b and covers aa bb, but the word boundary \b fails (between the second and third b).
- Second backtracking step:
.* gives back the second b and covers aa b, the word boundary fails (between the first and second b).
- Third backtracking step:
.* gives back the first b, the word boundary succeeds (between the space and the first b) and the literal b too.
The pattern succeeds. Note that the second branch of the alternation is never tested.
With the second pattern, the result is aaa bb.
What happens: when the first branch .* (*THEN) \b b is tried and fails at the end of the string, (*THEN) forbids (locally: i.e. only for this branch of the alternation in this group) the backtracking mechanism to occur. The first branch is abandoned, and the second one is tested. Note that this one succeeds even if it needs one backtracking step (for the \B).
Note that (*THEN) acts only locally and not for the whole pattern. In this example, this action stays confined to the non-capturing group (the innermost where is the (*THEN) verbs). Obviously it isn't the case for a pattern like ^ a .* (*THEN) \b b | ^ a .* b \B where local and global are the same.
About (*MARK:name) or the shorter syntax (*:name): it's used to name a position reached by the pattern in the string. If a pattern succeeds via a path that meets a marker, the name of this one is stored in the "object" of the match result. (The nature of this "object" depends of the implementation).
- It is useful for debugging purpose, for example to know which branch of a pattern has been used in a successful pattern.
- It can also be used in conjunction with
(*SKIP) to define the position where to retry the pattern (this one may be different from the position where (*SKIP) is encountered):
bla bla (*MARK:RetryHere) bla bla (*SKIP:RetryHere) blu
This pattern is retried and succeeds on the second (and last) "blabla" followed by "blu" in the subject string blablablablablu.
(*THEN:name) is only the short way to write (*MARK:name)(*THEN) (not very useful): if the subpattern fails the next branch is tried (as explained before), if the subpattern succeeds, the marker is stored somewhere in the result object (whatever it looks like, depending of the implementation).
- Same thing for
(*PRUNE:name), it's only a shortcut for (*MARK:name)(*PRUNE).