5

Raku has an interesting and exciting recursive-regex notation: <~~>.

So in the REPL, we can do this:

[0] > 'hellohelloworldworld' ~~ m/ helloworld /;
「helloworld」
[1] > 'hellohelloworldworld' ~~ m/ hello <~~>? world /;
「hellohelloworldworld」

Going directly from the Raku Docs for Recursive Regexes, we can capture/count various levels of nesting:

~$ raku -pe '#acts like cat here' nest_test.txt
not nested

previous blank
nestA{1}
nestB{nestA{1}2}
nestC{nestB{nestA{1}2}3}
~$ raku -ne 'my $cnt = 0; say m:g/  \{  [  <( <-[{}]>*  )> | <( <-[{}]>* <~~>*? <-[{}]>* )>  ] \} {++$cnt} /, "\t  $cnt -levels nested";'  nest_test.txt
()    0 -levels nested
()    0 -levels nested
()    0 -levels nested
(「1」)     1 -levels nested
(「nestA{1}2」)     2 -levels nested
(「nestB{nestA{1}2}3」)     3 -levels nested

(Above, change say to put to only return the captured string).

But I recently ran into an issue trying to solve a Unix & Linux question, which is: how to limit the recursion? Let's say we want to only capture below nestB. Is there anyway to do this using the <~~> recursive regex syntax?

~$ raku -ne 'my $cnt = 0; say m:g/ nestB  \{  [  <( <-[{}]>*  )> | <( <-[{}]>* <~~>*? <-[{}]>* )>  ] \} {++$cnt} /, "\t  $cnt -levels nested";'  nest_test.txt
()    0 -levels nested
()    0 -levels nested
()    0 -levels nested
()    0 -levels nested
()    0 -levels nested
()    0 -levels nested

NOTE: Above I've tried to force some sort of 'frugal recursive behavior' by using <~~>*?. The truth is <~~> (standard recursive notation), <~~>?, <~~>*, and <~~>*? all give identical results (rakudo-moar-2024.09-01).

What is the correct Raku recursive regex syntax?

0

1 Answer 1

6

Using Recursive Regexes in Raku: how to limit recursion-levels?

Increment a dynamic variable inside a <?{ ...}> conditional. For example:

my $*cnt;
say 'a' x 100 ~~ / <?{++$*cnt <= 5}> a <~~>? /; # 「aaaaa」
Sign up to request clarification or add additional context in comments.

4 Comments

FWIW, in this context it wouldn't need to be a dynamic variable. It could well be a ordinary lexical my $cnt as well. As long as it's "visible" from the regex.
@ElizabethMattijsen Yes, comment upvoted! Even though I think in some sense you're wrong. I will explain what I mean in a comment below, but want to first clarify why I wrote this answer as it is. The question in this SO's title was (too) broad, but simple and clear. In contrast the question's body, is narrow, complex, and, at least for me, very confusing. I ultimately concluded that a simple generic answer that responded to one interpretation of the general question in the title stood a chance of at least useful being useful to other readers regardless of any other consideration.
Understood: that's why I said "in this context" :-)
Hi, Yes I've been trying to grapple with the answer given by @raiph, which 1) satisfies the tittle of the Question I posted, 2). is thoroughly useful, and 3). works like a charm. So I will be accepting it. Note however that in some sense the correct answer to the question's body content "How to get a recursive regex to ignore a leading token?" hasn't been addressed. So maybe time for me to post another question? Thanks again, Raiph and Liz!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.