1

I'm trying to find all the unique whole words from a body of text. Currently this is what I am using but it doesn't seem to be working:

$textDump = "cat dog monkey cat snake horse"
$wholeWord = "/[\w]*/";
$uniqueWords = (preg_match($wholeWord, $textDump, $matches));

Any help would be appreciated. Thanks!

3
  • 1
    You want to use preg_match_all. And the result is thrown into the third variable, $matches Commented Feb 11, 2013 at 17:54
  • you're not capturing anything. try (\w*). Note: no need to use a character class ([]) for just a single "character". That's redundant. Commented Feb 11, 2013 at 17:55
  • possible duplicate of PHP preg_match to find multiple occurrences Commented Feb 11, 2013 at 17:56

4 Answers 4

6
array_unique(
    str_word_count($textDump,1)
);
Sign up to request clarification or add additional context in comments.

1 Comment

Is this really what you wanted? It finds all distinct words, so "cat dog cat", becomes [cat,dog]. It does not find unique words, i.e. [dog] from "cat dog cat"
2

You can use str_word_count

$textDump = "cat dog monkey cat snake horse";
$uniqueWords = (str_word_count($textDump, 1);

1 Comment

This just puts all of the words in an array: [cat, dog, monkey, cat, snake, horse]
1

Why not achieve this using explode(); and array_unique(); in this case?

$text = "cat dog monkey cat snake horse";

$foo = explode(" ", $text);
print_r(array_unique($foo)); 

3 Comments

It seems you have the same misunderstanding as I do. The question asks for word that appear once in the string, not removing duplicate.
Wouldn't using explode cause issues with punctuation though, since 'hello,' and 'hello' would both register as unique.
I didn't get the exact question,:)
1

The answers given so far all assume, that with "find all the unique whole words" you really meant "remove duplicates". Actually your question is not very clear about it, as you don't specify what your desired output is in your example, but I'll take you at your word and provide a solution for "find all the unique whole words".

This means, for the input:

"cat dog monkey cat snake horse"

You will get the output:

"dog monkey snake horse"

Actually, str_word_count is useful for this too, together with array_count_values, which actually counts the different values:

$wordCount = array_count_values(str_word_count($textDump,1));

$wordCount is now:

array(5) {
  ["cat"]    => int(2)
  ["dog"]    => int(1)
  ["monkey"] => int(1)
  ["snake"]  => int(1)
  ["horse"]  => int(1)
}

Next, remove the words with a word count higher than 1 (note, that the actual words are the array keys, so we use array_keys to get them:

$uniqueWords = array_keys(
    array_filter(
        $wordCount,
        function($count) {
            return $count === 1;
        }
    )
);

$uniqueWords is now:

array(4) {
  [0] => string(3) "dog"
  [1] => string(6) "monkey"
  [2] => string(5) "snake"
  [3] => string(5) "horse"
}

Complete code:

$textDump = "cat dog monkey cat snake horse";
$wordCount = array_count_values(str_word_count($textDump,1));
$uniqueWords = array_keys(
    array_filter(
        $wordCount,
        function($count) {
            return $count === 1;
        }
    )
);
echo join(' ', $uniqueWords);
//dog monkey snake horse

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.