Currently I have a mongo document that looks like this:
{'_id': id, 'title': title, 'date': date}
What I'm trying is to search within this document by title, in the database I have like 5ks items which is not much, but my file has 1 million of titles to search.
I have ensure the title as index within the collection, but still the performance time is quite slow (about 40 seconds per 1000 titles, something obvious as I'm doing a query per title), here is my code so far:
Work repository creation:
class WorkRepository(GenericRepository, Repository):
def __init__(self, url_root):
super(WorkRepository, self).__init__(url_root, 'works')
self._db[self.collection].ensure_index('title')
The entry of the program (is a REST api):
start = time.clock()
for work in json_works: #1000 titles per request
result = work_repository.find_works_by_title(work['title'])
if result:
works[work['id']] = result
end = time.clock()
print end-start
return json_encoder(request, works)
and find_works_by_title code:
def find_works_by_title(self, work_title):
works = list(self._db[self.collection].find({'title': work_title}))
return works
I'm new to mongo and probably I've made some mistake, any recommendation?