I have 3 nodes of Elasticsearch (version 6.2.4) in my dev cluster. All the configurations are the default (even shards). I am trying to run some searches which will return millions of records. I decided to use Scroll with Java High-Level Rest Client. So my code looks like this
MatchQueryBuilder matchQueryBuilder = new MatchQueryBuilder("galaxy", galaxyName);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(matchQueryBuilder);
searchSourceBuilder.size(scrollSize);
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices(galaxyIndexName);
searchRequest.source(searchSourceBuilder);
searchRequest.scroll(TimeValue.timeValueSeconds(scrollTimeValue));
SearchResponse searchResponse = restHighLevelClient.search(searchRequest);
StarCollection starCollection = new StarCollection();
boolean moreResultsExist = true;
int resultCount = 0;
while (moreResultsExist) {
String scrollId = searchResponse.getScrollId();
for (SearchHit searchHit : searchResponse.getHits()) {
Star star = objectMapper.readValue(searchHit.getSourceAsString(), Star.class);
resultCount++;
starCollection.addContentsItem(star);
}
if (resultCount >= searchResponse.getHits().getTotalHits()) {
moreResultsExist = false;
ClearScrollRequest request = new ClearScrollRequest();
request.addScrollId(scrollId);
restHighLevelClient.clearScroll(request);
}
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(TimeValue.timeValueSeconds(scrollTimeValue));
searchResponse = restHighLevelClient.searchScroll(scrollRequest);
}
Now, when I run search which returns 1.5 millions of documents, its taking forever. My method never finishes. Sometimes I get exception like
org.elasticsearch.ElasticsearchException: Elasticsearch exception [type=search_context_missing_exception, reason=No search context found for id
So, I have following questions -
- Is this the right way to use Scroll?
- Whats the best way to do searches which return millions of records?
scrollSizeand also possibly increase thescrollTimeValue.