preg_replace doesn't work as expected with numeric string data

Question

I've noticed a strange preg_replace() behaviour when I'm dealing with strings that start with a numeric character: The replacement strings have their first character (first digit) cut off. I'm seeing it in PHP 5.6.36 and PHP 7.0.30.

This code:

<?php

$items = array(
    '1234567890'   => '<a href="http://example.com/1234567890">1234567890</a>',
    '1234567890 A' => '<a href="http://example.com/123456789-a">1234567890 A</a>',
    'A 1234567890' => '<a href="http://example.com/a-1234567890">A 1234567890</a>',
    'Only Text'    => '<a href="http://example.com/only-text">Only Text</a>',
);

foreach( $items as $title => $item ) {
    $search = '/(<a href="[^"]+">)[^<]+(<\/a>)/';
    $replace = '$1' . $title . '$2';

    // Preserve for re-use.
    $_item = $item;

    // Doesn't work -- the titles starting with a number are wonky.
    $item = preg_replace( $search, $replace, $item );
    echo 'Broken: ' . $item . PHP_EOL;

    // Ugly hack to fix the issue.
    if ( is_numeric( substr( $title, 0, 1 ) ) ) {
        $title = ' ' . $title;
    }
    $replace = '$1' . $title . '$2';
    $_item = preg_replace( $search, $replace, $_item );
    echo 'Fixed:  ' . $_item . PHP_EOL;
}

produces this result:

Broken: 234567890</a>
Fixed:  <a href="http://example.com/1234567890"> 1234567890</a>
Broken: 234567890 A</a>
Fixed:  <a href="http://example.com/123456789-a"> 1234567890 A</a>
Broken: <a href="http://example.com/a-1234567890">A 1234567890</a>
Fixed:  <a href="http://example.com/a-1234567890">A 1234567890</a>
Broken: <a href="http://example.com/only-text">Only Text</a>
Fixed:  <a href="http://example.com/only-text">Only Text</a>

I've tested my regex online at https://regex101.com/, and as far as I can tell, it's written correctly. (It's not terribly complex, IMHO.)

Is this a PHP bug, or am I not completely grokking my regex?

Further examination makes me think that the substitution is the issue, ie, '$1' . '1234...' . '$2' is being interpreted as $11234...$2. — Pat J
– Pat J, Commented Jul 4, 2018 at 16:05

Toto · Accepted Answer · 2018-07-04 18:21:50Z

2

In order to avoid such behaviour, just change $1 to ${1}, same for $2

foreach( $items as $title => $item ) {
    $search = '/(<a href="[^"]+">)[^<]+(<\/a>)/';
    $replace = '${1}' . $title . '${2}';
    ...

answered Jul 4, 2018 at 18:21

Toto

91.7k63 gold badges97 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Pat J Over a year ago

Much better than my ugly hack. Thanks!

Pat J Over a year ago

...and to my embarrassment, that's literally covered in the first example on the documentation page.

Pat J · Accepted Answer · 2018-07-04 16:11:54Z

0

It appears that my $replace parameter ('$1' . $title . '$2') is to blame. Since the $title starts with a digit, it's being added to the $1, so the $replace looks like $11234...$2.

Solution:

$replace = '$1%s$2';
.
.
.
echo sprint( $item, $title );

...which has the advantage of not introducing spurious spaces into my page title links.

answered Jul 4, 2018 at 16:11

Pat J

53610 silver badges22 bronze badges

Collectives™ on Stack Overflow

preg_replace doesn't work as expected with numeric string data

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related