1

I have a set of article documents in elasticsearch with fields content and publish_datetime.

I am trying to retrieve most frequent words from articles with publish year == 2021.

GET articles/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "word_counts": {
      "terms": {
        "field": "content"
      }
    },
    "publish_datetime": {
      "terms": {
        "field": "publish_datetime"
      }
    },
    "aggs": {
      "word_counts_2021": {
        "bucket_selector": {
          "buckets_path": {
            "word_counts": "word_counts",
            "pd": "publish_datetime"
          },
          "script": "LocalDateTime.parse(params.pd).getYear() == 2021"
        }
      }
    }
  }
}

This fails on

{
  "error" : {
    "root_cause" : [
      {
        "type" : "parsing_exception",
        "reason" : "Unknown aggregation type [word_counts_2021]",
        "line" : 17,
        "col" : 25
      }
    ],
    "type" : "parsing_exception",
    "reason" : "Unknown aggregation type [word_counts_2021]",
    "line" : 17,
    "col" : 25,
    "caused_by" : {
      "type" : "named_object_not_found_exception",
      "reason" : "[17:25] unknown field [word_counts_2021]"
    }
  },
  "status" : 400
}

which does not make sense, because word_counts2021 is the name of the aggregation accordings to docs. It's not an aggregation type. I am the one who pics the name, so I thought it could have had basically any value.

Does anyone have any idea, what's going on there. So far, it seems pretty unintuitive service to me.

1 Answer 1

2

The agg as you have it written seems to be filtering publish_datetime buckets so that you only include those in the year 2021 to do that you must nest the sub-agg under that particular terms aggregation.

Like so:

GET articles/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "word_counts": {
      "terms": {
        "field": "content"
      }
    },
    "publish_datetime": {
      "terms": {
        "field": "publish_datetime"
      }
      "aggs": {
        "word_counts_2021": {
          "bucket_selector": {
            "buckets_path": {
              "pd": "publish_datetime"
            },
            "script": "LocalDateTime.parse(params.pd).getYear() == 2021"
          }
        }
      }
    }
  }
}

But, if that field has a date time type, I would suggest simply filtering with a range query and then aggregating your documents.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.