4

I'm trying to get all index document using python client but the result show me only the first document This is my python code :

res = es.search(index="92c603b3-8173-4d7a-9aca-f8c115ff5a18", doc_type="doc", body = {
'size' : 10000,
'query': {
    'match_all' : {}
}
})
print("%d documents found" % res['hits']['total'])
data = [doc for doc in res['hits']['hits']]
for doc in data:
    print(doc)
    return "%s %s %s" % (doc['_id'], doc['_source']['0'], doc['_source']['5'])
4
  • seems like there is only 1 doc for doc_type=doc. can you recheck. Commented May 7, 2018 at 8:54
  • This request must show me 3 documents the first print show me " 3 documents found" Commented May 7, 2018 at 9:06
  • 1
    You are returning inside loop! Thats why you only see one Commented May 7, 2018 at 9:16
  • Same problem when I return outside the loop statement Commented May 7, 2018 at 9:47

6 Answers 6

10

try "_doc" instead of "doc"

res = es.search(index="92c603b3-8173-4d7a-9aca-f8c115ff5a18", doc_type="_doc", body = {
'size' : 100,
'query': {
    'match_all' : {}
}
})
Sign up to request clarification or add additional context in comments.

Comments

5

Elasticsearch by default retrieve only 10 documents. You could change this behaviour - doc here . The best practice for pagination are search after query and scroll query. It depends from your needs. Please read this answer Elastic search not giving data with big number for page size

To show all the results:

for doc in res['hits']['hits']:
    print doc['_id'], doc['_source']

4 Comments

I have only 3 documents to get from this index
The problem is with the return statement of my function so how can I return the result correctly ?
just return data, which is defined as data = [doc for doc in res['hits']['hits']]
I want to return this result "%s %s %s" % (doc['_id'], doc['_source']['0'], doc['_source']['5'])
2

You can also use elasticsearch_dsl and its Search API which allows you to iterate over all your documents via the scan method.

import elasticsearch
from elasticsearch_dsl import Search

client = elasticsearch.Elasticsearch()
search = Search(using=client, index="92c603b3-8173-4d7a-9aca-f8c115ff5a18")

for hit in search.scan():
    print(hit)

3 Comments

search.scan() can go through all docs but it is very slow. Is there a way to improve it?
Now I found the doc elasticsearch-dsl.readthedocs.io/en/latest/… seems like that's the best it can do.
It is unfortunate that this is as fast as it goes. I would be much interested if there was a way to speed things up?
0

you can find info from here https://discuss.elastic.co/t/get-all-documents-from-an-index/86977/5

es = Elasticsearch(["http://x.x.x.x:9200/"])
doc = {"size": 10000, "query": {"match_all": {}}}

resObj = es.search(index="myIndex", doc_type="myType", body=doc, scroll="1m")

download_alldata = []
download_alldata.extend(resObj["hits"]["hits"])

total_data_size = resObj["hits"]["total"]["value"]

allbar = tqdm(total=total_data_size, desc="processing")

download_data = []
download_data.extend(resObj["hits"]["hits"])
allbar.update(len(resObj["hits"]["hits"]))

while True:
    try:
        i = es.scroll(scroll_id=resObj["_scroll_id"], scroll="1m")["hits"][
            "hits"
        ]

        download_data.extend(i)
        # print(t)
        allbar.update(len(i))

        if len(i) == 0:
            break
    except Exception as e:
        break
    
    
print(len(download_data))

Comments

-1

You can try the following query. It will return all the documents.

result = es.search(index="index_name", body={"query":{"match_all":{}}})

1 Comment

"search" function default return only 10 docs. You should at least increase the size.
-1

I dont see mentioned that the index must be refreshed if you just added data. Use this:

es.indices.refresh(index="index_name")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.