3

In an Elasticsearch cluster I have about 30 indices with the same structure.

I need to find out which of the indices would return at least 1 result for my query.

The result itself does not matter. I will make the business logic decisions based on the name of the index, that contains at least 1 document that satisfies the search criteria.

The search might return from 0 up to ~10 000 000 hits over all indices depending on the input. The search will be performed ~50 000 times with the different input.

I see the following solutions:

  1. Use the search API with scrolling and look at all results to find out from which index they are. This is what is currently implemented and I'm looking for a faster solution.
  2. Use the count API and do a count for every index. This will lead to more requests. Might this be faster?
  3. Is there another possibility/API available?
4
  • This may help you: stackoverflow.com/questions/28472008/… Commented Apr 12, 2020 at 5:12
  • and try with _search?size=0 instead of search_type=count Commented Apr 12, 2020 at 5:20
  • @AlwaysSunny Thanks for the link. Could not find it before. Commented Apr 14, 2020 at 10:26
  • welcome sir. No Problem. Commented Apr 14, 2020 at 10:28

2 Answers 2

3

I would use a terms bucket aggregation (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html) over the _index metadata field. Then, I would know what index has more than 1 hit.

E.g.,

{
  "query": { your_query },
  "aggs": {
    "group_by_index": {
      "terms": {
        "field": "_index",
        "size": "30"
      }
    }
  }
}
Sign up to request clarification or add additional context in comments.

1 Comment

I just had to add the size of indices that I query. Otherwise I did miss some indices.
0

I would use the aggs like @glenacota mentioned. In addition, you can run that over multiple indices or against an alias pointing to all your 30 indices like

GET my_index_1, another_index_*/_search?size=0

Though, I will also recommend to profile the query and see how it would fare against your cluster considering that you are looking at large number of indices, their document count and # of requests.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.