Consider 500 cities X 25000 addresses for a total of 12,500,000 docs with this simple shape:
{ city: "Cn", addr: "Am"}
where 0 < n < 499 and 0 < m < 10 (yes, addresses repeat). No indexes.
A "no effort agg" running on the mongo CLI the same machine as the server (MacBookPro 2.7GHz quadcore 16G RAM c2013) runs fast:
var nn = 0;
db.foo.aggregate().forEach(function(d) {
nn++;
});
print("found " + nn);
found 12500000
50170 millis to fetch // timing logic not included for clarity.
The moment we add even one $match, however, things slow down significantly on a "time-per-city" basis:
db.foo.aggregate([
{$match: {"city":"C0"}}
]).forEach(function(d) {
nn++;
});
print("found " + nn);
found 25000
9489 millis to fetch
5.3 times faster but only accounting for 1/500th of the cities! So if we were to put this in a loop and ask for cities C0 through C499, it stands to reason it would take roughly 500 * 9.4 seconds, or 4700 seconds or 1 hour, 18 mins. It turns out that adding additional fields to the $match or (slightly) more complex pipeline logic in general does not slow down performance nearly as much as the initial $match on a single field.
A Java program on a single thread yields roughly the same performance:
Document pquery = new Document();
pquery.put("city", "C"+i);
MongoCursor<Document> cc = coll.find(pquery).iterator();
int j = 0;
while(cc.hasNext()) {
Document src = cc.next();
j++;
}
// takes about 9.6 seconds for 1 city.
// 16 cities takes 164 seconds so roughly linear scale.
A multithreaded version where each thread operates on a different range of cities scales nicely with number of threads. 16 cities with 4 threads takes 45 seconds; 16 cities with 8 threads also takes nearly 45 seconds but is not a surprise given the co-lo client and server and thread capability on the MacBookPro. 100 cities with 4 threads takes 275 seconds, so 275 * 5 = 1375 seconds for 500 cities or about 23 minutes. The MacBookPro is surprisingly capable and it would require a medium-sized (CPU-wise) AWS instance to bring this under 10 minutes.
In general, therefore, the rule of thumb becomes you can go no faster than a single $match per thread if you need to drag the data over to the client. Larger payloads will obviously reduce the performance further.
$groupby city and getting the count (25000, obviously) took 24.3 seconds. What is the overall size of your docs and what is infra setup?cityoraddress, what percent ofaddressisnull, and how big is each doc? You are doingfind()with no projection so the whole thing is coming back over the wire.