1

I need to filter documents in an Elasticsearch index and then aggregate them by field. Here is the code of what I am trying to do:

import elasticsearch
from elasticsearch_dsl import Search, Q, Index, MultiSearch
es_client = elasticsearch.Elasticsearch([url],
        timeout=30, retry_on_timeout=True)
project_ids=['CSI'] 
family_ids=['SF6140691_WES_CIDR'] 
sample_ids=['S1379354_CIDR'] 
gene_symbols=['GLTPD1', 'CCNL2', 'MRPL20'] 

genes_filter = Q('bool', must=[Q('terms', project_id=project_ids),
                                   Q('terms', family_id=family_ids),
                                   Q('terms', sample_id=sample_ids),
                                   Q('terms', gene_symbol=gene_symbols)])
search = Search(using=es_client, index="GENES_DATA")
search = search.filter(genes_filter).execute()
results = search.aggs.bucket('by_family', 'terms', field='family_id', size=0)

Currently I am getting the following error:

'{!r} object has no attribute {!r}'.format(self.class.name, name)) AttributeError: 'Terms' object has no attribute 'execute'

I tried to switch filtering and aggregation, tried doing execute() at the very end, but it does not help. How could this simple transformation be achieved - filtering + aggregation? I found examples of doing aggregations separately or filtering separately but have trouble finding both in one query.

1 Answer 1

2

instead of

search = search.filter(genes_filter)
results = search.aggs.bucket('by_family', 'terms', field='family_id', size=0)

you should have:

search = search.filter(genes_filter)
search.aggs.bucket('by_family', 'terms', field='family_id', size=0)
results = search.execute()

First you add a filter, then you define the aggregations and finally you execute your search.

Sign up to request clarification or add additional context in comments.

2 Comments

The issue is that the buckets give me only the family name and count of the documents found, but not the whole object. Is there a way to get the following result because that is what I need: [{family_id_1: [doc_1, doc_2, ...]}, {family_id_2: [doc_8, doc_9, ...]}? I need a regular group_by, not just count of documents
Regular group by also doesn't give you rows, just the value you grouped by and potentially any aggregation functions you decide. If you want to retrieve the documents in the buckets use the top_hits aggregation - elastic.co/guide/en/elasticsearch/reference/current/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.