2

We're using ElasticSearch to find offers based on 5 fields, such like some 'free text', offer state and client name. We also need to aggregate on the two fields client name and offer state. So when someone enters some free text and we found say 10 docs with state closed and 8 with state open, the 'state filter' should contain closed(10) and open(8).

Now the problem is, when I select the state 'closed' to be included in the filter, the aggregation result for open changes to 0. I want this to remain 8. So how can I prevent the filter on the aggregations to influence the aggregation itself?

Here is the first query, searching for 'java':

{
    "query": {
        "bool": {
            "filter": [
            ],
            "must": {
                "simple_query_string": {
                    "query" : "java"
                }
            }
        }
    },
    "aggs": {
        "OFFER_STATE_F": {
            "terms": {
                "size": 0,
                "field": "offer_state_f",
                "min_doc_count": 0
            }
        }
    },
    "from": 0,
    "size": 1,
    "fields": ["offer_id_ft", "offer_state_f"]
}

The result is this:

{
  "hits": {
    "total": 960,
    "max_score": 0.89408284000000005,
    "hits": [
      {
        "_type": "offer",
        "_index": "select",
        "_id": "40542",
        "fields": {
          "offer_id_ft": [
            "40542"
          ],
          "offer_state_f": [
            "REJECTED"
          ]
        },
        "_score": 0.89408284000000005
      }
    ]
  },
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "timed_out": false,
  "aggregations": {
    "OFFER_STATE_F": {
      "buckets": [
        {
          "key": "REJECTED",
          "doc_count": 778
        },
        {
          "key": "ACCEPTED",
          "doc_count": 130
        },
        {
          "key": "CANCELED",
          "doc_count": 22
        },
        {
          "key": "WITHDRAWN",
          "doc_count": 13
        },
        {
          "key": "LONGLIST",
          "doc_count": 12
        },
        {
          "key": "SHORTLIST",
          "doc_count": 5
        },
        {
          "key": "INTAKE",
          "doc_count": 0
        }
      ],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    }
  },
  "took": 2
}

As you see, the sum of the client_state_f buckets is equal to the total hits (960). Now, I include one of the states in the query, say 'ACCEPTED'. So my query becomes:

{
    "query": {
        "bool": {
            "filter": [
                {
                    "bool": {
                        "should": [
                            {
                                "term": {
                                    "offer_state_f": "ACCEPTED"
                                }
                            }
                        ]
                    }
                }            
            ],
            "must": {
                "simple_query_string": {
                    "query" : "java"
                }
            }
        }
    },
    "aggs": {
        "OFFER_STATE_F": {
            "terms": {
                "size": 0,
                "field": "offer_state_f",
                "min_doc_count": 0
            }
        }
    },
    "from": 0,
    "size": 1,
    "fields": ["offer_id_ft", "offer_state_f"]
}

What I want is 130 results, but the client_state_f buckets stilling summing up to 960. But what I got is this:

{
  "hits": {
    "total": 130,
    "max_score": 0.89408284000000005,
    "hits": [
      {
        "_type": "offer",
        "_index": "select",
        "_id": "16884",
        "fields": {
          "offer_id_ft": [
            "16884"
          ],
          "offer_state_f": [
            "ACCEPTED"
          ]
        },
        "_score": 0.89408284000000005
      }
    ]
  },
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "timed_out": false,
  "aggregations": {
    "OFFER_STATE_F": {
      "buckets": [
        {
          "key": "ACCEPTED",
          "doc_count": 130
        },
        {
          "key": "CANCELED",
          "doc_count": 0
        },
        {
          "key": "INTAKE",
          "doc_count": 0
        },
        {
          "key": "LONGLIST",
          "doc_count": 0
        },
        {
          "key": "REJECTED",
          "doc_count": 0
        },
        {
          "key": "SHORTLIST",
          "doc_count": 0
        },
        {
          "key": "WITHDRAWN",
          "doc_count": 0
        }
      ],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    }
  },
  "took": 10
}

As you can see, only the ACCEPTED bucket is filled, all the others are 0.

2 Answers 2

5

You need to move your filters into the post_filter section instead of the query section.

That way, the filtering will applied after the aggregations are computed and you'll be able to aggregate the whole set of data, but only get result hits matching your filters.

Sign up to request clarification or add additional context in comments.

2 Comments

Hi, this does not achieve the desired result. I want to search on 'free text', count the occurances per state / client name WITHIN the results and then use those states / names as a multi select filter in narrowing the result. But the multiselect should be an 'OR'-wise filter. Any suggestions?
Sorry, I must have misunderstood the problem. Let me think about it, unless someone comes with a good solution in the meantime. maybe if you could share what you already have now, it would probably help picture it.
2

Ok, I found the answer with the help of a colleague, and the thing is, Val i is right. +1 for him. What I did was placing ALL of my query filters in the post_filter, and that's the problem. I only have to place the filters for the fields on which I want to agregate in the post_filter. Thus:

{
    "query": {
        "bool": {
            "filter": [
            {
                "term": {
                    "broker_f": "false"
                }
            }
            ],
            "must": {
                "simple_query_string": {
                    "query" : "java"
                }
            }
        }
    },
    "aggs": {
        "OFFER_STATE_F": {
            "terms": {
                "size": 0,
                "field": "offer_state_f",
                "min_doc_count": 0
            }
        }
    },
    "post_filter" : {
        "bool": {
            "should": [
                {
                    "term": {
                        "offer_state_f": "SHORTLIST"
                    }
                }
            ]
        }
    },
    "from": 0,
    "size": 1,
    "fields": ["offer_id_ft", "offer_state_f"]
}

And now the result is correct:

{
  "hits": {
    "total": 5,
    "max_score": 0.76667790000000002,
    "hits": [
      {
        "_type": "offer",
        "_index": "select",
        "_id": "24454",
        "fields": {
          "offer_id_ft": [
            "24454"
          ],
          "offer_state_f": [
            "SHORTLIST"
          ]
        },
        "_score": 0.76667790000000002
      }
    ]
  },
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "timed_out": false,
  "aggregations": {
    "OFFER_STATE_F": {
      "buckets": [
        {
          "key": "REJECTED",
          "doc_count": 777
        },
        {
          "key": "ACCEPTED",
          "doc_count": 52
        },
        {
          "key": "CANCELED",
          "doc_count": 22
        },
        {
          "key": "LONGLIST",
          "doc_count": 12
        },
        {
          "key": "WITHDRAWN",
          "doc_count": 12
        },
        {
          "key": "SHORTLIST",
          "doc_count": 5
        },
        {
          "key": "INTAKE",
          "doc_count": 0
        }
      ],
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0
    }
  },
  "took": 4
}

1 Comment

Don't forget +1 "for him" ;-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.