2

I'm looking for a function, class or collection of functions that will assist in the process of pattern matching strings as I have a project that requires a fair amount of pattern matching and I'd like something easier to read and maintain than raw preg_replace (or regex period).

I've provided a pseudo example in hopes that it will show what I'm asking.

$subject = '$2,500 + $550 on-time bonus, paid 50% upfront ($1,250), 50% on delivery ($1,250 + on-time bonus).';
$pattern = '$n,nnn';
pattern_match($subject, $pattern, 0);

would return "$2,500".

$subject = '$2,500 + $550 on-time bonus, paid 50% upfront ($1,250), 50% on delivery ($1,250 + on-time bonus).';
$pattern = '$n,nnn';
pattern_match($subject, $pattern, 1);

would return an array with the values: [$2,500], [$1,250], [$1,250]

The function — as I'm trying to write — uses 'n' for numbers, 'c' for lower-case alpha and 'C' for upper-case alpha where any non-alphanumeric character represents itself.

6
  • 3
    It would probably be best to create a function that interprets that as a regular expression, then formats the returned array more like you want it. Other ways would be much, much slower. Commented Mar 31, 2011 at 18:58
  • 3
    ...so a dummied-down regex wrapper? Commented Mar 31, 2011 at 18:59
  • 2
    Next thing you know, you'll want n{from=2,to=4}, ($|¥|€), etc. etc. and you'll be developing the 'Smarty' of regex matching. Are you sure you wanna go this route? Commented Mar 31, 2011 at 19:09
  • 2
    Your line « $pattern = "$n,nnn"; » is not correct since $n is not an available variable, it should be: « $pattern = '$n,nnn'; » Commented Mar 31, 2011 at 19:52
  • @freeyedboy: I know you were mostly kidding, but I actually don't hate that idea nearly as much as I hate smarty. I find regex unintuitive because I don't work with it regularly enough to develop an intimate, or even basic, understanding of it. A more readable and user friendly format wouldn't be a bad thing if it weren't at the expense of efficiency and speed. Commented Mar 31, 2011 at 21:29

5 Answers 5

4
<?php

// $match_all = false: returns string with first match
// $match_all = true:  returns array of strings with all matches

function pattern_match($subject, $pattern, $match_all = false)
{
  $pattern = preg_quote($pattern, '|');

  $ar_pattern_replaces = array(
      'n' => '[0-9]',
      'c' => '[a-z]',
      'C' => '[A-Z]',
    );

  $pattern = strtr($pattern, $ar_pattern_replaces);

  $pattern = "|".$pattern."|";

  if ($match_all)
  {
    preg_match_all($pattern, $subject, $matches);
  }
  else
  {
    preg_match($pattern, $subject, $matches);
  }

  return $matches[0];
}

$subject = '$2,500 + $550 on-time bonus, paid 50% upfront ($1,250), 50% on delivery ($1,250 + on-time bonus).';
$pattern = '$n,nnn';

$result = pattern_match($subject, $pattern, 0);
var_dump($result);

$result = pattern_match($subject, $pattern, 1);
var_dump($result);
Sign up to request clarification or add additional context in comments.

1 Comment

+1, definitely an improvement over mine. I wasn't aware of preg_quote() until now!
1

Here is the function with no regexp that should work ('C' and 'c' recognize only ascii chars) , enjoy:

function pattern_match($subject, $pattern, $result_as_array) {

    $pattern_len = strlen($pattern);
    if ($pattern_len==0) return false; // error: empty pattern

    // translate $subject with the symboles of the rule ('n', 'c' or 'C')
    $translate = '';
    $subject_len = strlen($subject);
    for ($i=0 ; $i<$subject_len ; $i++) {
        $x = $subject[$i];
        $ord = ord($x);
        if ( ($ord>=48) && ($ord<=57) ) { // between 0 and 9
            $translate .= 'n';
        } elseif ( ($ord>=65) && ($ord<=90) ) { // between A and Z
            $translate .= 'C';
        } elseif ( ($ord>=97) && ($ord<=122) ) { // between a and z
            $translate .= 'c';
        } else {
            $translate .= $x; // othre characters are not translated
        }
    }

    // now search all positions in the translated string

    // single result mode
    if (!$result_as_array) {
        $p = strpos($translate, $pattern);
        if ($p===false) {
            return false;
        } else {
            return substr($subject, $p, $pattern_len);
        }
    }

    // array result mode
    $result = array();
    $p = 0;
    $n = 0;
    while ( ($p<$subject_len)  && (($p=strpos($translate,$pattern,$p))!==false) ) {
        $result[] = substr($subject, $p, $pattern_len);
        $p = $p + $pattern_len;
    }
    return $result;

}

1 Comment

Quite a good one! But why if ($n>10) exit(); in the last while loop ?
1

Update: This is an incomplete answer that doesn't hold up against several test patterns. See @Frosty Z's answer for a better solution.

<?php
    function pattern_match($s, $p, $c=0) {
        $tokens = array(
            '$' => '\$',
            'n' => '\d{1}',
            'c' => '[a-z]{1}',
            'C' => '[A-Z]{1}'
        );
        $reg = '/' . str_replace(array_keys($tokens), array_values($tokens), $p) . '/';
        if ($c == 0) {
            preg_match($reg, $s, $matches);
        } else {
            preg_match_all($reg, $s, $matches);
        }
        return $matches[0];
    }

    $subject = "$2,500 + $550 on-time bonus, paid 50% upfront ($1,250), 50% on delivery ($1,250 + on-time bonus).";

    $pattern = '$n,nnn';
    print_r(pattern_match($subject, $pattern, 0));
    print_r(pattern_match($subject, $pattern, 1));

    $pattern = 'cc-cccc';
    print_r(pattern_match($subject, $pattern));
    print_r(pattern_match($subject, $pattern, 1));
?>

Output:

$2,500

Array
(
    [0] => $2,500
    [1] => $1,250
    [2] => $1,250
)

on-time

Array
(
    [0] => on-time
    [1] => on-time
)

Note: Make sure to use single-quotes for your $pattern when it contains $, or PHP will try to parse it as a $variable.

3 Comments

Sorry, but it doesn't works with patterns like cc/cc (error) or n.nnn (returns '2.500', but as well '2,500', '2#500'...)
@Frosty Z: Yeah, I was in a rush to get an Answer up before I went out for lunch. @Gumbo: It felt like a good idea at the time?
No problem, sometimes I have some flaws in my "quick answers" too :-) Concerning Gumbo's comment, in fact [A-Z]{1} is equivalent to [A-Z] so adding {1} seems not necessary.
0

The function you're looking for is preg_match_all, although you'll need to use REGEX patterns for your pattern matching.

3 Comments

It seems that the OP wants to create his own patterns.
He explicitly said he didn't want to use regular expressions.
Could write a wrapper that converts his simpler patterns to REGEX (e.g. str_replace('n', '[0-9]', $pattern);), but short of completely re-inventing the REGEX wheel this is what he's got to work with.
0

Sorry, but this is a problem for regex. I understand your objections, but there's just no other way as efficient or simple in this case. This is an extremely simple matching problem. You could write a custom wrapper as jnpcl demonstrated, but that would only involve more code and more potential pitfalls. Not to mention extra overhead.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.