0

I Have following data :

<h5> MY DATA </h5>
» Test data1
» Test data2
» Test data3

And I wish to match all ' »' except the first. But the various regex that i've tried do not work. Kindly advice some solution.

Thanks

6
  • stackoverflow.com/questions/2850092/skip-first-regex-match Commented Nov 10, 2013 at 10:13
  • Hi Pendo, thanks for your reply. But that is a work around - since it first finds the first match and position, then delete everything, then put whatever was in the first spot back in. I was wondering if there was something within a single regex that could solve the problem. regexr.com?374h8 Commented Nov 10, 2013 at 10:20
  • There is another answer that is fully regex (the 2nd post on the page) Commented Nov 10, 2013 at 10:21
  • But the various regex that i've tried do not work well then show us what you have tried and why it did fail. Also please be precise, what if there was another random line like foobar between » Test data1 and » Test data2. Would it be matched or not ? Commented Nov 10, 2013 at 10:28
  • Hi HamZa, Thnx for replying. No the line will not be matched. Only the symbols. Skipping first. I tried (?:^»)(.*)(»), ?!^.?)(»), ...etc Commented Nov 10, 2013 at 10:44

2 Answers 2

2

But why do you want to match every » except the first one? You'll get much better responses if you tell us what you're trying to accomplish, not how you're trying to accomplish it.

As I understand it, you have a block of two or more lines that start with a certain character and you want to add a <br/> tag to the end of every line except the last one. When you describe it that way, the regex practically writes itself:

^        # beginning of line (in multiline mode)
(».+\R)  # `»` and the rest of the line, including the trailing newline (`\R`)
(?=»)    # lookahead checks that the next line begins with `»`, too

The line is captured in group #1, so we plug it back into the replacement string and add the <br/> tag:

$result = preg_replace('/^(».+\R)(?=»)/m', '$1<br/>', $subject);

I'm not fluent in PHP, but it's possible you'll need to add the UTF8 modifier (/^(».+\R)(?=»)/mu) or use a hex escape for the » character (/^(\x{BB}.+\R)(?=\x{BB})/m).

Sign up to request clarification or add additional context in comments.

2 Comments

Nice regex-fu but there is a little misunderstanding. He wants to match every » except the first one and not the last one :)
D'oh! The word "last" in my first sentence should have been "first". But the reasoning still applies. He's trying to add » to the beginning of all but the first line, which is the same as adding it to the end of all but the last line. And if you think about it in those terms, it becomes much easier to see your way to a solution, or so I believe.
1

You can try this:

$result = preg_replace('~[^>\s]\h*\R\K(?=»)~', '<br/>', $string);

details:

[^>\s]  # a character that is not a white char or a > (to avoid the first line)
\h*     # horizontal white chars zero or more times (possible leading spaces)
\R      # a new line
\K      # remove all that have been matched before
(?=»)   # lookahead assertion, to check if there is a » after

The goal of the pattern is to match an empty string at the good position in the string.

1 Comment

Thanks everyone.. tried out a few solutions including this.. but couldn't get it to work... Finally went for a workaround for the break issue not related to the symbol, but identifying the newlines : /(^[\rn]*|[\r\n]+)[\s\t]*[\r\n]+/

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.