Wildcard replace in PHP

Question

I have no experience using regular expressions in PHP, so I usually write some convoluted function using a series of str_replace(), substr(), strpos(), strstr() etc (you get the idea).

This time I want to do this correctly, I know I need to use a regex for this, but am confused as to what to use (ereg or preg), and how exactly the syntax should be.

NOTE: I am NOT parsing HTML, or XML, and sometimes I will be using delimiters other than (for example, | or ~ or [tag] or ::). I am looking for a generic way to do a wildcard replace in between two known delimiters using regex, I am not building an HTML or XML parser.

What I need is a regex that replaces this:

<sometag>everything in here</sometag>

with this:

<sometag>new contents</sometag>

I have read the documentation online for a bit, but I am confused, and am hoping one of you regex experts can pop in a simple solution. I suspect I will pass the values to a function, something like this:

$new_text = swapText ( "<sometag>", $the_new_text_to_go_into_the_dag );

function swapText ( $in_tag_with_brackets_to_update, $in_new_text ) {
 // define tags
 $starting_tag  = $in_tag_with_brackets_to_update;
 $ending_tag    = str_replace( "<", "</", $in_tag_with_brackets_to_update) );

 // not sure if this is the proper regex match string or not
 // and/or if any escaping needs to be done on the tags
 $find_string         = "{$starting_tag}.*{$ending_tag}";
 $replace_with_string = "{$starting_tag}{$in_new_text}{$ending_tag}";

 // after some regex, this function should return new version of <tag>data</tag>
}

Thanks.

Please use a parser: stackoverflow.com/questions/1732348/… — BalusC
– BalusC, Commented Nov 29, 2009 at 17:03
thanks BalusC, but I am not trying to parse HTML, although I can see how my question may lead you to believe that. — OneNerd
– OneNerd, Commented Nov 29, 2009 at 17:08
Galen - I am simply looking for a way to replace an unknown block of text inside a known set of delimiters (I used tags as an example of one of the many things I will be using as delimiters). Perhaps I should have used a different example for delimiters. — OneNerd
– OneNerd, Commented Nov 29, 2009 at 17:12
Even if your tags are not real HTML tags, a parser would still be a better way to go if they always follow the HTML/XML format. You should be able to find/replace everything within sometag easily. — DisgruntledGoat
– DisgruntledGoat, Commented Nov 29, 2009 at 17:14

troelskn · Accepted Answer · 2009-11-30 09:39:43Z

You say that you are not going to parse xml and then goes on to show an xml example. That's a bit confusing.

Now, the reason why you can't use regular expressions to parse xml, is that they aren't contextual. Therefore there are a whole class of problems that regular expressions can't be used for. This includes nested tags (Whether they are xml or not), so keep that in mind.

That out of the way, you should be using preg - not ereg. ereg is a lesser used, slower and now deprecated type of regular expressions. Just forget about it.

In pcre (Perl Compatible Regular Expressions), which is the language that preg uses, a . (dot) is a wildcard, that matches any single character (Except newline). You can put a quantifier after a match. A quantifier can be an explicit range of numbers, such as {1,3} (meaning at least one, but up to 3) or you can use one of the short hand symbols, such as + (Short for {1,}, meaning at least one) or * (Meaning any number, including zero). With this knowledge, you can match anything with .*.

By default, expressions will match the largest possible pattern (Known as being greedy). You can change this with the ? modifier. Thus .*? will match anything, but take the shortest possible pattern. This can then be used to match any delimited value like follows:

~<foo>.*?</foo>~

Note that I'm using ~ as the delimiter here to avoid having to escape / in the expression. The standard is to use / as delimiter, in which case the expression would have looked like this:

/<foo>.*?<\/foo>/

In general, the above is bad practise, since it's much better to match a negated character class than a dot, but to keep things simple for you, just ignore this until you get the basics under your skin. It'll work in most cases. In particular, since the . doesn't match newlines, this won't work if the content contains a newline character. If you need this you can do one of two things: Either you add a modifier to the expression, or, you replace the . with a character class, that includes newlines. For example [\s\S] (Meaning a whitespace character or a non-whitespace character, which is the same as anything). This is how the expression would look then:

~<foo>.*?</foo>~s

Or:

~<foo>[\s\S]*?</foo>~

To put all this to work, let's pass it to the preg_replace function:

echo preg_replace('~<foo>.*?</foo>~s', '<foo>Lorem Ipsum</foo>', $input);

If your tag-names are variable, you can build the expression up like you would with an SQL query. Just like SQL, you need to escape certain characters. Use preg_quote for that:

function swapText($tagname, $replacement_text, $input) {
  $tagname_escaped = preg_quote($tagname, '~');
  return preg_replace(
    '~<' . $tagname_escaped . '>.*?</' . $tagname_escaped . '>~s',
    '<' . $tagname . '>' . $replacement_text . '</' . $tagname . '>',
    $input);
}

Note that . matches anything except line breaks. Besides that, excellent answer!
thanks. I think it will do what I need, and based on your excellent explanations, I think I can re-purpose the swapText function to handle other kinds of delimiters I am using throughout my app. Thanks again!

ghostdog74 · Accepted Answer · 2009-11-30 00:22:22Z

3

@OP, there's no need to use complicated regex or a parser if your task is very simple. an example just using your normal substrings....

$mystr='<sometag>everything in here</sometag>';
$start=strpos($mystr,"<sometag>");
$end=strpos($mystr,"</sometag>");
print substr($mystr,0,$start+strlen("<sometag>") ) . "new value" . substr($mystr,$end);

answered Nov 30, 2009 at 0:22

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

1 Comment

OneNerd Over a year ago

thanks - thought the regex would work, but yours worked better and also with newline characters which the regex didn't.

Yacoby · Accepted Answer · 2009-11-29 19:13:53Z

1

First, if it is html you are replacing, use something like simple html dom. If the format is exactly what you say (as in, <sometag> can't be <sometag >), then regex may be ok to use.

Don't use ereg based functions, as they are deprecated, use the preg functions.

preg_replace('%(<sometag>)[^<]*(</sometag>)%i', '$1something else$2', $str);

EDIT
A slightly better version of the above, now supports having a < in the text

preg_replace('%(<sometag>).*?(</sometag>)%i', '$1something else$2', $str);

The $1 and $2 are the matched text between the brackets. As these are constant, they could be replaced with the constant

preg_replace('%<sometag>.*?</sometag>%i', '<sometag>something else</sometag>', $str);

edited Nov 29, 2009 at 19:13

answered Nov 29, 2009 at 17:10

Yacoby

55.6k16 gold badges119 silver badges121 bronze badges

5 Comments

OneNerd Over a year ago

the ending piece should be </sometag> not <sometag>. Do I need to escape a / with a \ (eg: <\/sometag>). Also what does [^<] do? Is it looking for text that starts with a < ? If so, that is not what I need. Thanks -

Yacoby Over a year ago

Fixed the end tag. [^<] matches all characters that are not '<'. Both examples fit your test data. If its not what you want, you need to explain more clearly what you do want.

Matteo Riva Over a year ago

This does not work, the slash in </sometag> will be seen as pattern delimiter resulting in a parse error. Either escape it or (better) use different pattern delimiters.

Yacoby Over a year ago

Please clarify the reason for the -1

Bart Kiers Over a year ago

The -1 was not from me, but your solution will fail if there are line breaks between the opening and closing tag.

jorisw · Accepted Answer · 2011-06-09 12:27:44Z

I've written the following function to replace parts of a string by wildcard:

function wildcardReplace($String,$Search,$Filler,$Wildcard = '???'){

        list($startStr,$endStr) = explode($Wildcard,$Search);

        $start = strpos($String,$startStr);

        // Make sure the end point is the first closest match after the start string.   

        $endofstarter = strpos($String,$startStr) + strlen($startStr);

        $startofender = strpos(
                    substr($String,$endofstarter),
                    $endStr
                ) + $endofstarter;


        $Result = substr($String,0,$start+strlen($startStr) ) . $Filler. substr($String,$startofender);

        // Replace any remaining stuff

        $RemainingString = substr($String,$startofender);

        // If theres any matches left, replace them

        if(strpos($RemainingString,$startStr)>-1) $Result = str_replace($RemainingString,wildcardReplace($RemainingString,$Search,$Filler),$Result);

        return $Result;
}

Example use: $Output = wildcardReplace('<a href="http://www.youtube.com/watch?v=dQw4w9WgXcQ"><img src="rickroll.png" width="500"></a>','width="???"',350,'???')

Collectives™ on Stack Overflow

Wildcard replace in PHP

4 Answers 4

2 Comments

1 Comment

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

1 Comment

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related