1

I'm trying to run a simple replacement on some input data that could be described as follows:

  • take a regular expression
  • take an input data stream
  • on every match, replace the match through a callback

Unfortunately, preg_replace_callback() doesn't work as I'd expect. It gives me all the matches on the entire line, not individual matches. So I need to put the line together again after replacement, but I don't have the information to do that. Case in point:

<?php
echo replace("/^\d+,(.*),(.*),.*$/", "12,LOWERME,ANDME,ButNotMe")."\n";
echo replace("/^\d+-\d+-(.*) .* (.*)$/", "13-007-THISLOWER ThisNot THISAGAIN")."\n";


function replace($pattern, $data) {
    return preg_replace_callback(
        $pattern, 
        function($match) {
            return strtolower($match[0]);
        }, $data
    );
}

https://www.tehplayground.com/hE1ZBuJNtFiHbdHO

gives me 12,lowerme,andme,butnotme, but I want 12,lowerme,andme,ButNotMe.

I know using $match[0] is wrong. It's just to illustrate here. Inside the closure I need to run something like

foreach ($match as $m) { /* do something */ }

But as I said, I have no information about the position of the matches in the input string which makes it impossible to put the string together again.

I've digged through the PHP documentation as well as several searches and couldn't find a solution.


Clarifications:

I know that $match[1], $match[2]... etc contain the matches. But only a string, not a position. Imagine in my example the final string is also ANDME instead of ButNotMe - according to the regex, it should not be matched and the callback should not be applied to it. That's why I'm using regexes in the first place instead of string replacements.

Also, the reason I'm using capture groups this way is that I need the replacement process to be configurable. So I cannot hardcode something like "replace #1 and #2 but not #3". On a different input file, the positions might be different, or there might be more replacements needed, and only the regex used should change.

So if my input is "15,LOWER,ME,NotThis,AND,ME,AGAIN", I want to be able to just change the regex, not the code and get the desired result. Basically, both $pattern and $data are variable.

20
  • You'll have $match[1] which is the first set of () in the pattern and match[2] which will be the second set of ()so print_r($match); to see. And pattern is probably not right ^\d+,([^,]+),([^,]+),.*$ or something. Commented May 23, 2019 at 13:20
  • I'm aware that I have $match[n]. But that includes only the string of the matches, not their position in the input data. I cannot use that to run a replacement because I could have the same string in a different place where it should not be replaced. Commented May 23, 2019 at 13:27
  • Is it always just the second and third value you want to change casing on? Commented May 23, 2019 at 13:34
  • see clarification above. Sorry for not including it immediately. Commented May 23, 2019 at 13:34
  • 1
    You don't specify HOW you know it needs to be modified if not by position in the pattern. Given 12,LOWERME,ANDME,ButNotMe why not lower ButNotMe??? What is the logic to not lower that one but lower the other 2??? Commented May 23, 2019 at 13:40

2 Answers 2

1

This uses preg_match() and PREG_OFFSET_CAPTURE to return the capture groups and the offset within the original string where it is found. This then uses substr_replace() with each capture group to replace only the part of the string which is to be changed - this stops any chance of replacing similar text which you do not want to be changed...

function lowerParts (string $input, string $regex ) {
    preg_match($regex, $input, $matches, PREG_OFFSET_CAPTURE);
    array_shift($matches);
    foreach ( $matches as $match )  {
        $input = substr_replace($input, strtolower($match[0]),
            $match[1], strlen($match[0]));
    }
    return $input;
}
echo lowerParts ("12,LOWERME,ANDME,ButNotMe", "/^\d+,(.*),(.*),.*$/");

gives...

12,lowerme,andme,ButNotMe

But also with

echo lowerParts ("12,LOWERME,ANDME,LOWERME", "/^\d+,(.*),(.*),.*$/");

it gives

12,lowerme,andme,LOWERME

Edit:

If the replacement data is of different lengths, then you would need to chop the string up into parts and replace each one. The complication is that each change in length alters the relative position of the offsets, so this has to keep track of what this offset is. This version also has a parameter which is the process you want to apply to the strings (this example just passes "strtolower") ...

function processParts (string $input, string $regex, callable $process ) {
    preg_match($regex, $input, $matches, PREG_OFFSET_CAPTURE);
    array_shift($matches);
    $offset = 0;
    foreach ( $matches as $match )  {
        $replacement = $process($match[0]);
        $input = substr($input, 0, $match[1]+$offset)
                 .$replacement.
                 substr($input, $match[1]+$offset+strlen($match[0]));
        $offset += strlen($replacement) - strlen($match[0]);
    }
    return $input;
}
echo processParts ("12,LOWERME,ANDME,LOWERME", "/^\d+,.*,(.*),(.*)$/", "strtolower");
Sign up to request clarification or add additional context in comments.

5 Comments

This works as long as the replacement has the same length as the original. Unfortunately in my use case that isn't always true. My apologies that this was not clear in the question.
Added a replacement, it just chops the string into parts, also allows you to pass a function to process the strings with.
Interesting approach. Might be faster than Nick's answer because it needs fewer regex computations.
@Tom First example will work if you loop in reverse order (foreach (array_reverse($matches) ...) {}) - replacement won't affect other offests.
@NigelRen you should just update your answer with the array_reverse idea. Seems a reasonable trade...
1

This will work:

function replaceGroups(string $pattern, string $string, callable $callback)
{
    preg_match($pattern, $string, $matches, PREG_OFFSET_CAPTURE);
    array_shift($matches);

    foreach (array_reverse($matches) as $match) {
        $string = substr_replace($string, $callback($match[0]), $match[1], mb_strlen($match[0]));
    }

    return $string;
}

echo replaceGroups("/^\d+-\d+-(.*) .* (.*)$/", "13-007-THISLOWER ThisNot THISAGAIN", 'strtolower');

1 Comment

If you are going to post this a separate answer, I would at least appreciate some form of attribution of where most of the code comes from. (Besides the only explanation being This will work provides very little for any future reader)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.