0

I've noticed a strange preg_replace() behaviour when I'm dealing with strings that start with a numeric character: The replacement strings have their first character (first digit) cut off. I'm seeing it in PHP 5.6.36 and PHP 7.0.30.

This code:

<?php

$items = array(
    '1234567890'   => '<a href="http://example.com/1234567890">1234567890</a>',
    '1234567890 A' => '<a href="http://example.com/123456789-a">1234567890 A</a>',
    'A 1234567890' => '<a href="http://example.com/a-1234567890">A 1234567890</a>',
    'Only Text'    => '<a href="http://example.com/only-text">Only Text</a>',
);

foreach( $items as $title => $item ) {
    $search = '/(<a href="[^"]+">)[^<]+(<\/a>)/';
    $replace = '$1' . $title . '$2';

    // Preserve for re-use.
    $_item = $item;

    // Doesn't work -- the titles starting with a number are wonky.
    $item = preg_replace( $search, $replace, $item );
    echo 'Broken: ' . $item . PHP_EOL;

    // Ugly hack to fix the issue.
    if ( is_numeric( substr( $title, 0, 1 ) ) ) {
        $title = ' ' . $title;
    }
    $replace = '$1' . $title . '$2';
    $_item = preg_replace( $search, $replace, $_item );
    echo 'Fixed:  ' . $_item . PHP_EOL;
}

produces this result:

Broken: 234567890</a>
Fixed:  <a href="http://example.com/1234567890"> 1234567890</a>
Broken: 234567890 A</a>
Fixed:  <a href="http://example.com/123456789-a"> 1234567890 A</a>
Broken: <a href="http://example.com/a-1234567890">A 1234567890</a>
Fixed:  <a href="http://example.com/a-1234567890">A 1234567890</a>
Broken: <a href="http://example.com/only-text">Only Text</a>
Fixed:  <a href="http://example.com/only-text">Only Text</a>

I've tested my regex online at https://regex101.com/, and as far as I can tell, it's written correctly. (It's not terribly complex, IMHO.)

Is this a PHP bug, or am I not completely grokking my regex?

1
  • Further examination makes me think that the substitution is the issue, ie, '$1' . '1234...' . '$2' is being interpreted as $11234...$2. Commented Jul 4, 2018 at 16:05

2 Answers 2

2

In order to avoid such behaviour, just change $1 to ${1}, same for $2

foreach( $items as $title => $item ) {
    $search = '/(<a href="[^"]+">)[^<]+(<\/a>)/';
    $replace = '${1}' . $title . '${2}';
    ...
Sign up to request clarification or add additional context in comments.

2 Comments

Much better than my ugly hack. Thanks!
...and to my embarrassment, that's literally covered in the first example on the documentation page.
0

It appears that my $replace parameter ('$1' . $title . '$2') is to blame. Since the $title starts with a digit, it's being added to the $1, so the $replace looks like $11234...$2.

Solution:

$replace = '$1%s$2';
.
.
.
echo sprint( $item, $title );

...which has the advantage of not introducing spurious spaces into my page title links.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.