2

I am using elasticsearch-py (es version is 2.3) and would like to return just the 'title' field from all documents in an index with the mapping: actors, director, genre, plot, title, year.

I'm currently trying messages = es.search(index="movies", _source=['hits.hits.title']) and the resulting response is:

{u'hits': {u'hits': [{u'_score': 1.0, u'_type': u'movie', u'_id': u'tt0116996', u'_source': {}, u'_index': u'movies'}, {u'_score': 1.0, u'_type': u'movie', u'_id': u'1', u'_source': {}, u'_index': u'movies'}], u'total': 2, u'max_score': 1.0}, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}, u'took': 2, u'timed_out': False}

I've tried different versions of filter paths and source field lists but can't seem to get it right.

2
  • You probably want just _source: ['title'] Commented Nov 29, 2016 at 16:46
  • Tried messages = es.search(index="movies", _source=['title']) which returned {u'hits': {u'hits': [{u'_score': 1.0, u'_type': u'movie', u'_id': u'tt0116996', u'_source': {u'title': u'Mars Attacks!'}, u'_index': u'movies'}, {u'_score': 1.0, u'_type': u'movie', u'_id': u'1', u'_source': {u'title': u'I saw a movie once a tale'}, u'_index': u'movies'}], u'total': 2, u'max_score': 1.0}, u'_shards': {u'successful': 1, u'failed': 0, u'total': 1}, u'took': 1, u'timed_out': False}so i'm still missing something, somewhere Commented Nov 29, 2016 at 16:52

3 Answers 3

4

You can apply source filtering with:

messages = es.search(index="movies", _source=["title"])

but you'll still need to parse the response. For this you can do something like:

titles = [hit["title"] for hit in messages["hits"]["hits"]["_source"]]]

There is nothing in the elasticsearch-py API (as far as I know) that will flatten down the rather verbose response you get from Elasticsearch.

Sign up to request clarification or add additional context in comments.

1 Comment

wouldn't it be rather : titles = [hit['_source']['title'] for hit in msg['hits']['hits']]
1

You can now use the _source_exclued and _source_include kwargs in the search function to limit what fields are returned.

So something like:

messages = es.search(index="movies", _source=["title"], _source_include=['title'])

Comments

1

I had similar problem and this is how I solved it. I needed it in a bit different context - I had to use information about the title later in the loop:

res = es.search(index="movies", body={"query": {"match_all": {}}})
for hit in res['hits']['hits']:
    title = hit['_source'].get('title')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.