1

Thanks in advance for any help guys! So, I have two collections: A and B.

A is a collection of personal information:

{
  "_id": "3453hkj54h5k34j5hkjh"  
  "location": "New York, U.S.",  
  "first-name": "Archer",  
  "last-name": "Vice",  
  "industry": "intelligence"
},
{
  "_id": "3453hkj5sdfdddjh",    
  "location": "London, UK",    
  "first-name": "Harry",    
  "last-name": "Potter",    
  "industry": "security"
},
{
  "_id": "345dfdf5sdfdddjh",
  "location": "D.C., US",
  "first-name": "Obama",  
  "last-name": "Barack",  
  "industry": "president"
}   

B is a collection of location information in united state:

{
  "_id": "998sdfdsfhejf",  
  "city": "New York",    
  "zip": "10122",  
  "state": "NY",  
  "lat": 40.749,  
  "longt": -73.9885
},  
{
  "_id": "998sdfsdfdsfhejf",  
  "city": "D.C."  
  "zip": "20500",  
  "state": "DC",  
  "lat": 38.8951,  
  "longt": -77.0369
}  

I what to find out who lives in US by comparing the location field in A against city field in B. B should be a sub string of A, as A often carries state, or country information.

I already converted B to an array by:

var f = db.collection.find(), n = [];
for (var i = 0; i < f.length(); i++) n.push(f[i]['field']);

now B is var n=["D.C.", "New York"]

I know how to check if something is in the array. you do:

db.database.find({
   field: 
      { 
         $in: array 
      } 
   }); 

To check substring you do this:

db.database.find({A: /substring/ });

or

db.database.find({A: {$regex: 'substring'}});

expected results are

{
  "_id": "3453hkj54h5k34j5hkjh",    
  "location": "New York, U.S.",   
  "first-name": "Archer",  
  "last-name": "Vice",  
  "industry": "intelligence"
},
{
  "_id": "345dfdf5sdfdddjh",
  "location": "D.C., US",  
  "first-name": "Obama",  
  "last-name": "Barack",  
  "industry": "President"
}   

"D.C., US" contains substring "D.C." which is a value in the array n=["D.C.", "New York"].

I know I can do it through mapreduce, but it really just seems to be a one liner. I'm also learning how to join these two collections.

5
  • So is target now always an array? Commented Apr 21, 2015 at 15:15
  • no, it's just a field, sorry about not being clear. I will edit the post a bit. Commented Apr 21, 2015 at 15:25
  • 2
    Its not very clear what you have and what you want. Is B an array in a script or is it a field somewhere or is it a collection. Also why do you call your collections A and B and then query on a field named A in the test collection? Commented Apr 21, 2015 at 15:36
  • Can you show us a whole document? Commented Apr 21, 2015 at 15:47
  • B was a collection, but it was converted to an array by me. I only use it as a checklist. I will show you the complete document in case you are interested. Commented Apr 21, 2015 at 16:48

1 Answer 1

4

This is not super simple to do in a statement, but it is possible. If your list of search terms is as short as you have stated in the question you can do it in one line combining it into a regular expression like this:

 db.test.find({location: {$regex: new RegExp(n.join('|'))}})

If the list is not too long that is. It will be quite slow if the regexp gets too complex. If its very short you could ff course then you could also write out the RegExp literally.

n is defined in the shell as you have in the question. Here i used:

var n = ["D.C.", "New York"];

This will give the following result:

{ "_id" : "3453hkj54h5k34j5hkjh", "location" : "New York, U.S.", "first-name" : "Archer", "last-name" : "Vice", "industry" : "intelligence" }
{ "_id" : "345dfdf5sdfdddjh", "location" : "D.C., US", "first-name" : "Obama", "last-name" : "Barack", "industry" : "president" }

EDIT

Here is an alternative for your join if your list is too long:

n.reduce(function (lst, d) {
    var res = db.test.find({location: {$regex: d}}).toArray();
    Array.prototype.push.apply(lst, res); 
    return lst;
}, []);

It loops over all entries in your list and finds the matching entries and adds up all the results into a new list.

If you want you could insert them into a new collection instead to avoid keeping it all in memory. You could also use a search directly and not extract the result from collection B into a list. This should also be better in terms of memory.

This will save the result to a collection names test_result (using collections A and B in the searches):

db.B.find().forEach(function (d) { 
    db.test_result.insert(db.A.find({location: {$regex: d.city}}).toArray())
});
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks a lot, but unfortunately it's not as short, I got all us cities in there, 18691 of them, so it's kinda complicated. I was hoping to do this, instead of simply query, I just output them to another collection. My code in progress is this: for (var i=0; i<n.length(); i++){db.A.aggregate([ { $match: { location: {$regex: n[1]} } }, { $out: "inuscity" } ]);};
You should really consider what your goal is here. Regex searching for all terms in a 18691 item long list is a bit dodgy. Its not something you would do in a simple statement really as you are asking. Using mongo to do a fuzzy join like this is not optimal and not something mongo supports out of the box. You could however make it happen with some js magic as you are figuring out.
Goal is to filter out everyone not in US. I initially wanted to set this up with Hadoop(mongodb connector), use hive to do the query, but it didn't work out. Using mongodb and JavaScript is plan B, plan C is to dump everything into MySQL. This is for a school project. Thanks a lot! I will test it when I get home.
Hi while, I can understand your second script where it returns a cursor. However, I have hard time understand how last one worked. It gave me error message $regex has to be a string.
Sorry. Forgot the field name city. It just loops over the result of the first query and uses each entry to query the second collection and inserts each result into a new collection.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.