1

I have content like foo == 'bar test baz' and test.asd = "buz foo". I need to match the "identifiers", the ones on the left that are not within double/single quotes. This is what I have now:

preg_replace_callback('#([a-zA-Z\\.]+)#', function($matches) {
    var_dump($matches);
}, $subject);

It now matches even those within strings. How would I write one that does not match the string ones?

Another example: foo == 5 AND bar != 'buz' OR fuz == 'foo bar fuz luz'. So in essence, match a-zA-Z that are not inside strings.

2
  • 1
    why not explode(" =",$subject);? Commented Dec 8, 2011 at 8:40
  • @k102: It's not that simple. I can't make up every possible variation, but the subject can vary a lot in structure. For example: foo = 'bar' AND baz = 'foo'. Commented Dec 8, 2011 at 9:29

3 Answers 3

1
/^[^'"=]*/

would work on your examples. It matches any number of characters (starting at the start of the string) that are neither quotes nor equals signs.

/^[^'"=\s]*/

additionally avoids matching whitespace which may or may not be what you need.

Edit:

You're asking how to match letters (and possibly dots?) outside of quoted sections anywhere in the text. This is more complicated. A regex that can correctly identify whether it's currently outside of a quoted string (by making sure that the number of quotes, excluding escaped quotes and nested quotes, is even) looks like this as a PHP regex:

'/(?:
 (?=      # Assert even number of (relevant) single quotes, looking ahead:
  (?:
   (?:\\\\.|"(?:\\\\.|[^"\\\\])*"|[^\\\\\'"])*
   \'
   (?:\\\\.|"(?:\\\\.|[^"\'\\\\])*"|[^\\\\\'])*
   \'
  )*
  (?:\\\\.|"(?:\\\\.|[^"\\\\])*"|[^\\\\\'])*
  $
 )
 (?=      # Assert even number of (relevant) double quotes, looking ahead:
  (?:
   (?:\\\\.|\'(?:\\\\.|[^\'\\\\])*\'|[^\\\\\'"])*
   "
   (?:\\\\.|\'(?:\\\\.|[^\'"\\\\])*\'|[^\\\\"])*
   "
  )*
  (?:\\\\.|\'(?:\\\\.|[^\'\\\\])*\'|[^\\\\"])*
  $
 )
 ([A-Za-z.]+) # Match ASCII letters/dots
)+/x'

An explanation can be found here. But probably a regex isn't the right tool for this.

Sign up to request clarification or add additional context in comments.

3 Comments

Do you have an idea that could work in the just-added example in the question?
@rFactor Your edited question is unclear. Please provide also the desired output.
This is a bit more difficult, especially because quotes can be escaped or contain other quotes like "2\" by 4\"", "Don't" and such.
1

You could also try this:

preg_match_all('/[\w.]+(?=(?:[^\'"]|[\'"][^\'"]*["\'])*$)/', $subject, $result, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($result[0]); $i++) {
    # Matched text = $result[0][$i];
}

To match all letters, digits and _ and dots outside your quotes. You can extend your allowable characters by adding them into [\w.]

2 Comments

I'm doing a preg_replace_callback, because I need to replace those matches. If I just do a match, how do I know I'm replacing the right content (e.g., foo = 'foo' should match the first foo and replace it with my custom stuff, but it should not affect the one in the strings)?
@rFactor You could use the same regex with preg_replace_callback. Point is, with your edited question, it is unclear to me which characters you want to catch.
1

The trick I use here is to force the regex to branch whenever it encounters a quote, then later on we ignore this branch.

$subject = <<<END
foo == 'bar test baz' and test.asd = "buz foo"
foo == 5 AND bar != 'buz' OR fuz == 'foo bar fuz luz'
END;

$regexp = '/(?:["\'][^"\']+["\']|([a-zA-Z\\.]+\b))/';

preg_replace_callback($regexp, function($matches) {;
    if( count($matches) >= 2 ) {
        print trim($matches[1]).' ';
    }
}, $subject);

// Output: 'foo and test.asd foo AND bar OR fuz '

The main part of the regexp is

(?: anything between quotes | any word consisting of a-zA-Z )

2 Comments

Sorry if I was too imprecise. I believe that does not exactly match the case I added in the question? Or maybe I don't fully understand the regex, but by the look of it, it assumes there are no characters besides a-z and quotes?
I've modified my answer to only match the a-zA-Z case. Ask if the regexp is unclear to you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.