I am trying to split a string into terms in PHP using preg_split. I need to extract normal words ( \w ) but also currency ( even currency symbol ) and numeric terms ( including commas and decimal points ). Can anyone help me out, as I cannot seem to create a valid regex to use for preg_split to achieve this. Thanks
3 Answers
Why not use preg_match_all() instead of preg_split() ?
$str = '"1.545" "$143" "$13.43" "1.5b" "hello" "G9"'
. ' This is a test sentence, with some. 123. numbers'
. ' 456.78 and punctuation! signs.';
$digitsPattern = '\$?\d+(\.\d+)?';
$wordsPattern = '[[:alnum:]]+';
preg_match_all('/('.$digitsPattern.'|'.$wordsPattern.')/i', $str, $matches);
print_r($matches[0]);
2 Comments
dscer
That's quite close to what I need. Is it possible to adjust the regex to exclude the period, except in the middle of numbers? Example: "This is a number: 43234. These are some words." Your solution results in: Array ( [0] => This [1] => is [2] => a [3] => number [4] => 43234. [5] => These [6] => are [7] => some [8] => words. )
Maxime Pacary
I've updated my answer consequently ; could you test with the new regex?
Does it solve your problem to split on whitespace? "/\s+/"
1 Comment
dscer
Not quite, as I do not want to include punctuations except in numeric terms.
/.+/or are these in the of a string you need filtered? I don't understand what you are trying to split still.