This is a basic principle of using regular expressions and testing each string against the source string and emitting the found count for the result. In mapReduce terms, you want your "mapper" function to possibly emit multiple values for each "term" as a key, and for every array element present in each document.
So you basically want a source array of regular expressions to process ( likely just a word list ) to iterate and test and also iterate each array member.
Basically something like this:
db.collection.mapReduce(
function() {
var list = ["the", "quick", "brown" ]; // words you want to count
this.projects.forEach(function(project) {
project.log.forEach(function(log) {
list.forEach(function(word) {
var res = log.subject.match(new RegExp("\\b" + word + "\\b","ig"));
if ( res != null )
emit(word,res.length); // returns number of matches for word
});
});
});
},
function(key,values) {
return Array.sum(values);
},
{ "out": { "inline": 1 } }
)
So the loop processes the array elements in the document and then applies each word to look for with a regular expression to test. The .match() method will return an array of matches in the string or null if done was found. Note the i and g options for the regex in order to search case insensitive and beyond just the first match. You might need m for multi-line if your text includes line break characters as well.
If null is not returned, then we emit the current word as the "key" and the count as the length of the matched array.
The reducer then takes all output values from those emit calls in the mapper and simply adds up the emitted counts.
The result will be one document keyed by each "word/term" provided and the count of total occurances in the inspected field within the collection. For more fields, just add more logic to sum up the results, or similarly just keep "emitting" in the mapper and let the reducer do the work.
Note the "\\b" represents a word boundary expression to wrap each term escaped by` in order to construct the expression from strings. You need these to discriminate "the" from "then" for example, by specifying where the word/term ends.
Also that as regular expressions, characters like [] are reserved, so if you actually were looking for strings like that the you similarly escape, i.e:
"\[A\]"
But if you were actually doing that, then remove the word boundary characters:
new RegExp( "\[A\]", "ig" )
As that is enough of a complete match in itself.
[A] ,[B]and[C]represent words you want to look for and ultimately return the count of how many times each word appears within all documents. Correct? Have you at least done some basic research on mapReduce and understand how mapper and reducer functions work? This is always within the same "log" field within the "projects" array?