1

I have an array of strings. I am trying to extract the data in the parentheses, ( and ), from each string. The problem is that it does not extract the data in the middle from the first element, if there is nothing else in front of it.

This is the code snippet with an indication of the needed/captured values:

<?php

$data = [
    'aaa|45.85[u]52.22 - 43.75 - 36.5[d]25.75',
// #1^^^       #2^^^^^ #3^^^^^        #4^^^^^
    'bbb|238.4[u]345.45 - 24.1[d]13.85 - 56.4[d]56'
// #1^^^       #2^^^^^^        #3^^^^^        #4^^
];

$new = [];

foreach ($data as $element)
{
    preg_match("#^(.*?)\|[\w\[\.]+\]?(.*?) - [\w\[\.]+\]?(.*?) - [\w\[\.]+\]?(.*?)$#", $element, $match);
    
    $string = $match[1];
    $num1 = $match[2];
    $num2 = $match[3];
    $num3 = $match[4];

    $new[$string] = [
        'num1' => $num1,
        'num2' => $num2,
        'num3' => $num3,
    ];
}

print_r($new);

?>

The code above should gives me this result:

$new = [
    'aaa' => [
        'num1' => '52.22',
        'num2' => '43.75',
        'num3' => '25.75',
    ],

    'bbb' => [
        'num1' => '345.45',
        'num2' => '13.85',
        'num3' => '56',
    ]
];

But it gives me this:

$new = [
    'aaa' => [
        'num1' => '52.22',
        'num2' => '',
        'num3' => '25.75',
    ],

    'bbb' => [
        'num1' => '345.45',
        'num2' => '13.85',
        'num3' => '56',
    ]
];
2
  • The middle regex is expecting "1 or more" [\w\[\.]+ characters before the bracket, in the first case, there isn't one so it's not matching. Changing + to * may help if that is in fact a valid match Commented Feb 2, 2022 at 3:41
  • 1
    @nice_dev the OP doesn't actually have parentheses in the input -- it is just how the OP is expressing the targeted substrings. Commented Feb 2, 2022 at 5:54

1 Answer 1

1

See this demonstration of how your second [\w\[\.]+ character class is over-matching because dots and digits are greedily matched AND your capture group allows a zero-width match. https://regex101.com/r/zq6czS/1

With only two sample strings, it is very hard to confidently suggest a truly optimized pattern, but I recommend seeking ways to greedy quantifiers for improved performance.

  1. Before the first pipe, collect all characters that are not a pipe -- ([^|]+).
  2. To capture the non-whitespace substring after optionally occurring "float then square-braced letter", again use a negated character class -- (?:[^\]]+\])?(\S+)

The advice in #2 just repeats three times; delimited by "space hyphen space", of course.

Code: (Demo) (or with functionless assignments)

$data = [
    'aaa|45.85[u]52.22 - 43.75 - 36.5[d]25.75',
    'bbb|238.4[u]345.45 - 24.1[d]13.85 - 56.4[d]56'
];

$result = [];
foreach ($data as $element) {
    if (preg_match("#^([^|]+)\|(?:[^\]]+\])?(\S+) - (?:[^\]]+\])?(\S+) - (?:[^\]]+\])?(\S+)$#", $element, $matches)) {
        unset($matches[0]);
        $result[array_shift($matches)] = array_combine(['num1', 'num2', 'num3'], $matches);
    }
}
var_export($result);

Once you have your 5-element output matches array, remove the fullstring match ($matches[0]), then peel off the new first element and use it as the first level key, then the remaining elements can be added to the subarray.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.