0

I am trying to split a string into terms in PHP using preg_split. I need to extract normal words ( \w ) but also currency ( even currency symbol ) and numeric terms ( including commas and decimal points ). Can anyone help me out, as I cannot seem to create a valid regex to use for preg_split to achieve this. Thanks

4
  • Can you give an example of things you want to capture? Commented Jan 6, 2012 at 23:19
  • I need to extract terms such as: "1.545" "$143" "$13.43" "1.5b" "hello" "G9" Thanks for the reply! Commented Jan 6, 2012 at 23:26
  • It looks like you are just trying to capture anything that shows up. you could do a dotall capture easily. It's just /.+/ or are these in the of a string you need filtered? I don't understand what you are trying to split still. Commented Jan 6, 2012 at 23:31
  • No, that's not what I need. I need to extract the above words into an array, hence why I mentioned preg_split. I then intend to use the words one at a time in an inverted index. So, to make it clearer, I need the following sentence: "Big brown fox - $20.45" to result in an array as follows: - Big - brown - fox - $20.45 Commented Jan 6, 2012 at 23:40

3 Answers 3

1

Why not use preg_match_all() instead of preg_split() ?

$str = '"1.545" "$143" "$13.43" "1.5b" "hello" "G9"'
  . ' This is a test sentence, with some. 123. numbers'
  . ' 456.78 and punctuation! signs.';

$digitsPattern = '\$?\d+(\.\d+)?';
$wordsPattern = '[[:alnum:]]+';

preg_match_all('/('.$digitsPattern.'|'.$wordsPattern.')/i', $str, $matches);

print_r($matches[0]); 
Sign up to request clarification or add additional context in comments.

2 Comments

That's quite close to what I need. Is it possible to adjust the regex to exclude the period, except in the middle of numbers? Example: "This is a number: 43234. These are some words." Your solution results in: Array ( [0] => This [1] => is [2] => a [3] => number [4] => 43234. [5] => These [6] => are [7] => some [8] => words. )
I've updated my answer consequently ; could you test with the new regex?
1

What about preg_match_all() each word with this [\S]+\b then you get an array with the words in it.

Big brown fox - $20.25 will return

preg_match_all('/[\S]+\b/', $str, $matches);

$matches = array(
 [0] = 'Big',
 [1] = 'brown',
 [2] = 'fox',
 [3] = '$20.25'
)

Comments

0

Does it solve your problem to split on whitespace? "/\s+/"

1 Comment

Not quite, as I do not want to include punctuations except in numeric terms.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.