13

I have few documents in a folder and I want to check if all the documents in this folder are indexed or not. To do so, for each document name in the folder, I would like to run through a loop for the documents indexed in ES and compare. So I want to retrieve all the documents.

There are few other possible duplicates of the same question like retrieve all records in a (ElasticSearch) NEST query and enter link description here but they didnt help me as the documentation has changed from that time.(there is nothing about scan in the current documentation)

I tried using client.search<T>() . But as per the documentation, a default number of 10 results are retrieved. I would like to get all the records without mentioning the size of records ? (Because the size of the index changes)

Or is it possible to get the size of the index first and then send this number as input to the size to get all the documents and loop through?

9
  • Did you try using scroll? elastic.co/guide/en/elasticsearch/client/net-api/1.x/… Commented Jun 14, 2016 at 0:56
  • Hi Russ. I tried using it and was able to get the scrollId. Once I get a scrollId, I dont know how to run the search query again (which will generate some more scrollId's I believe) till I retrieve all the documents list. I didnt find any example in NEST for the same. (I was checking the 2.x version of documentation. Anyways will try it with the example given in the link you have posted) Thanks. Commented Jun 14, 2016 at 1:19
  • 1
    The link in the first comment has an example - it executes a search specifying search type of scroll, then uses the scroll id to get the first page of results. It then loops to get all documents, using the scroll id returned from the last response. You can also use fields in conjunction to get say only one field of the document back for each result, rather than returning the whole document Commented Jun 14, 2016 at 1:20
  • Tried it and its working.. Thanks a ton Russ. But SearchType(Nest.SearchType.Scan) doesnt seems to be working. I had to use SearchType(Elasticsearch.Net.SearchType.Scan). After using the scrolls do I have to delete the scrolls or will they get cleared off after the mentioned time? Commented Jun 14, 2016 at 1:24
  • 1
    elastic.co/guide/en/elasticsearch/reference/current/… Commented Jun 14, 2016 at 1:26

2 Answers 2

20

Here is how I solved my problem. Hope this helps. (References https://www.elastic.co/guide/en/elasticsearch/client/net-api/1.x/scroll.html , https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html#scroll-search-context)

List<string> indexedList = new List<string>();
var scanResults = client.Search<ClassName>(s => s
                .From(0)
                .Size(2000)
                .MatchAll()
                .Fields(f=>f.Field(fi=>fi.propertyName)) //I used field to get only the value I needed rather than getting the whole document
                .SearchType(Elasticsearch.Net.SearchType.Scan)
                .Scroll("5m")
            );

        var results = client.Scroll<ClassName>("10m", scanResults.ScrollId);
        while (results.Documents.Any())
        {
            foreach(var doc in results.Fields)
            {
                indexedList.Add(doc.Value<string>("propertyName"));
            }

            results = client.Scroll<ClassName>("10m", results.ScrollId);
        }

EDIT

var response = client.Search<Document>(s => s
                         .From(fromNum)
                         .Size(PageSize)
                         .Query(q => q ....
Sign up to request clarification or add additional context in comments.

10 Comments

I dont quite understand your logic. your first matchAll query will return only 2000 documents. are you doing a scroll over only 2000 docs? What if I have 5000 docs?
@batmaci that number 2000 is not the total number of records. It's the count of records to be fetched every time which is valid for some time mentioned in the scroll. (for eg: first I will fetch 0-10 records, then I will fetch 11-20 records and so on. so 2000 is just an example.)
do you know if it is ok to set this A high number or any Performance issue can cause. As i know that 10 000 is the max count for a search query
@ASN What's the difference in the scroll times where you first use "5m", then use "10m"? What do they each do?
@wnbates The scroll parameter tells Elasticsearch to keep the search context open for another 5m or 10m.
|
-4

You can easily perform the following to get all records in index:

var searchResponse = client.Search<T>(s => s
                                    .Index("IndexName")
                                    .Query(q => q.MatchAll()
                                           )
                                     );

var documents = searchResponse.Documents.Select(f => f.fieldName).ToList();

1 Comment

ES return 10000 max documents by search request. You have to use scroll in order to get more.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.