24

I have a string that covers several lines. I need to extract the text between two strings. For example:

Start Here Some example
text covering a few
lines. End Here

I need to extract the string, Start Here Some example text covering a few lines.

How do I go about this?

0

3 Answers 3

39

Use the /s regex modifier to treat the string as a single line:

/s Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.

  $string =~ /(Start Here.*)End Here/s;
  print $1;

This will capture up to the last End Here, in case it appears more than once in your text.

If this is not what you want, then you can use:

  $string =~ /(Start Here.*?)End Here/s;
  print $1;

This will stop matching at the very first occurrence of End Here.

Sign up to request clarification or add additional context in comments.

4 Comments

You're also using the greedy match, so if someone has something that says... "start blah blah end blah blah start blah blah end", it will capture both start/end sequences. If you use .*? instead, you'll limit yourself to one match at a time.
Doesn't work for me: echo -e "test1\ntest2" > test && perl -ne 'print $_ if /test1.*test2/s' test prints nothing.
@Hi-Angel That's a different related question about using the -n flag, which is mostly answered at Perl command line multi-line replace --- (however if it isn't, it should be asked in a new question)
@user202729 thanks for clearing up confusion, I just tried it, works for me. Upshot though: it seems perl's "treat all text as a single line" interface has a limited usefulness, because it either matches the whole text, or a single match of a group. For example: echo -e "test1\ntest2\ntest3\ntest1\ntest2" > test && perl -0777 -ne 'print $1 if /(test1\ntest2)/' test gives "test1\ntest2" output just once. Using ^ and $ is also not possible. I'll see if I find time to report a feature request for proper multiline support, kind of like Emacs regexes do.
15
print $1 if /(Start Here.*?)End Here/s;

Comments

3

Wouldn't the correct modifier to treat the string as a single line be (?s) rather than (/s) ? I've been wrestling with a similar problem for quite a while now and the RegExp Tester embedded in JMeter's View Results Tree listener shows my regular expression extractor with the regex

(?s)<FMSFlightPlan>(.*?)</FMSFlightPlan>

matches

<FMSFlightPlan>
C87D
AN NTEST/GL 
- FPN/FN/RP:DA:GCRR:AA:EIKN:F:SAMAR,N30540W014249.UN873. 
BAROK,N35580W010014..PESUL,N40529W008069..RELVA,N41512W008359.. 
SIVIR,N46000W008450..EMPER,N49000W009000..CON,N53545W008492 
</FMSFlightPlan>

while the regex

(?s)<FMSFlightPlan>(.*?)</FMSFlightPlan>

does not match. Other regex testers show the same result. However when I try to execute a the script I get the Beanshell Assertion error:

Assertion failure message: org.apache.jorphan.util.JMeterException: Error invoking bsh method: eval Sourced file: inline evaluation of: ``import java.io.*; //write out the data results to a file outfile = "/Users/Dani . . . '' Token Parsing Error: Lexical error at line 12, column 380. Encountered: "\n" (10),

So something else is definitely wrong with mine. Anyway, just a suggestion

1 Comment

The two regexes you mentioned are identical: "(?s)<FMSFlightPlan>(.*?)</FMSFlightPlan>"... Is there maybe a typo in the second one?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.