1

I have an index with the following structure.

{
      "title": "Your top FIY tips",
      "content": "Fix It Yourself in April 2012.",
      "tags": [
        {
          "tagName": "Fix it yourself"
        },
        {
          "tagName": "customer tips"
        },
        {
          "tagName": "competition"
        }
      ]  
}

The mapping looks like

{
"articles": {
"mappings": {
  "article": {
    "properties": {
      "content": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "tags": {
        "type": "nested",
        "properties": {
          "tagName": {
            "type": "text",
            "fields": {
              "raw": {
                "type": "keyword"
              }
            }
          }
        }
      }
    }
  }
}
}

I am using the following DSL query to search on the "content" and "title" fields and narrow the results down by a certain "tagName". Then use aggregates to count the tagNames within that query.

GET /articles/_search
{
  "from": 1,
  "size": 10,
  "aggs": {
    "tags": {
      "nested": {
        "path": "tags"
      },
      "aggs": {
        "tags-tagnames": {
          "terms": {
            "field": "tags.tagName.raw"
          }
        }
      }
    }
  },
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "FIY",
            "fields": [
              "title",
              "content"
            ]
          }
        },
        {
          "nested": {
            "query": {
              "terms": {
                "tags.tagName": [
                  "competition"
                ]
              }
            },
            "path": "tags"
          }
        }
      ]
    }
  }
}

The search query and filter of the "tagNames" works fine. However the aggregates is not quite working. It doesn't seem to include the nested query data within the results. The aggregate results that come back are just based on the multi match search.

How can I include the nested query within the aggregates.

Sample documents at

https://gist.github.com/anonymous/83bc2b1bfa0ac0d295d42297e1d76c00

8
  • what does the mapping look like in the index i.e. what does GET {index}/_mapping return? Is tags mapped as a nested type? Commented Mar 17, 2017 at 20:36
  • @RussCam tags is mapped as nested. Have updated the question to include the mappings Commented Mar 17, 2017 at 20:41
  • Do you have small example set to reproduce? It looks like you expect to get the raw tag names for tags that include the term competition in the tag name and match FIY in the title or content? Is that not what you're seeing? Commented Mar 17, 2017 at 21:02
  • The aggregates is returning results based on matching FIY in the title or content, it's not filtering on the tag competition. The query works fine. The aggregates is not correct Commented Mar 17, 2017 at 21:11
  • 1
    @RussCam using the latest version 5.2. Have added the documents to a gist gist.github.com/anonymous/83bc2b1bfa0ac0d295d42297e1d76c00 Commented Mar 17, 2017 at 22:40

1 Answer 1

1

After discussing, I think I understand your problem better:

you wish to run the aggregation only on those documents that are included based on the "from" and "size" specified in the query.

"from" only affects the hits that are returned for the query, aggregations calculate on all documents that will match the query.

What you want to do is currently not possible due to the way in which Elasticsearch works. There are two phases to a search request in Elasticsearch:

Query phase

The query phase is when all shards in the cluster are queried, the document ids for docs that match the query are returned. Aggregations also run in the query phase.

Fetch phase

In the fetch phase, the actual documents that match the ids from the query phase are fetched and included in the result. In your scenario, you would need the aggregation to run in the fetch phase, to aggregate only over those docs included from the query phase.

The only way to affect which documents are taken into account for the aggregation is to include additional queries/filters in the query of the request, but there is no query that says "documents in sort order positions 1 to 10" as far as I am aware.

You could always aggregate client side for your particular use case here, as you are aggregating effectively on the verbatim value in each tag

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.