0

I'm trying to use regex to select instances where something like <? (a php construct is used but not <?php. I've tried several iterations on regextester but have failed. Here's the latest <\?(?!<\?php)

Basically this is what I want. it covers all the variations in my document

<?php foo ?> //should not match
<? bar ?> //should match '<?'
<?=foobar ?> //should match '<?='
<?xml barbar ?> //should not match

I'm new to regex so any help would be appreciated

Edit: With the problems with the answers posted I'm adding one more condition to match

<?php foo ?> //should not match
<? bar ?> //should match '<?'
<?bar ?> //should match '<?' there could be any character after ?
<?=foobar ?> //should match '<?='
<?xml barbar ?> //should not match

To summarize, I'm only trying to match <? or <?= not the complete line they occur in.

Edit 2: Basically the logic of the expression should be: match <? or <?= but not if followed by `php' or 'xml'

5
  • Use this:- (<\?(?=\s))|(<\?=) Commented May 18, 2016 at 8:33
  • I'd suggest <\?[^\w\s]*+(?!(?:php|xml)\b) but not sure if there are any other exceptions. However, I think I found a more generic approach, please see my answer. Commented May 18, 2016 at 8:36
  • please be as precise as you want, I've added all possibilities in my document Commented May 18, 2016 at 9:01
  • For the record, <?=foobar ?> and <?bar ?> would throw syntax errors and would fail to execute if run by PHP, as a valid opening tag must be followed by some sort of whitespace (space, newline, tab, etc). So if your goal to find valid php code, those two would actually not be valid. Commented May 18, 2016 at 9:08
  • Actually I received some code and I need to find these invalid tags so I could replace them with valid ones. That's why I wanna skip <?php, cauz it's valid already Commented May 18, 2016 at 9:21

4 Answers 4

1

You can use

<\?[^\w\s]+|<\?\B

See the regex demo

Pattern details:

  • <\?[^\w\s]+ - a literal <? sequence followed with 1+ characters other than a word and whitespace
  • | - or
  • <\?\B - <? literal character sequence followed with a non-word boundary (meaning there should be a non-word character right after ?).
Sign up to request clarification or add additional context in comments.

Comments

1

You can use a negative lookahead assertion:

<\?=?(?!php|xml)

(?!php|xml) will fail the match if there is php or xml text after <? ir ,?=, thus failing <?php and <?xml.

RegEx Demo

3 Comments

but it should match <?=
No you're selecting the whole line with that code. I only needed the <? and <?=. Put <\?=?(?!php)(?!xml) in your demo and see the difference.
It works in demo also It had your comments after each input string.
0

Use below regex:-

(<\?(?=\s).*)|(<\?=.*)

It will match both record

<? bar ?> 
<?=foobar ?> 

If you write only

(<\?(?=\s))|(<\?=)

the it will match <? and <?= only

Comments

0

I finally found it.

<\?=?(?!php)(?!xml)

This will match <? with an optional = not followed by the string 'php' and 'xml'

1 Comment

You don't need 2 negative loookahead assertions. Check my answer it can done as (?!php|xml)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.