1

I'm using elasticsearch in my laravel-app and I'm trying to use the range-query. I have an array of companies, which in different periods have different amounts of employees, but I'm only interested in the newest period, which in this case means the last item of the employees array.

so, basically the array looks like this:

"company" => [
   "name" => "some company",
   "company_number" => "1234567",
   "status" => "normal",
   "employees" => [
      "period_1" => [
         "amount" => 10
       ],
       "period_2" => [
         "amount" => 15
       ],
       "period_3" => [
         "amount" => 24
       ],
       etc etc...
    ]
 ]

so, in the frontend, you can enter a minimum and a maximum value to search for companies with certain amounts of employees. In my Controller, I then do this:

"query":{
    "bool": {
        "should" : [
          { "match" : { "company.status" : "normal" },
          {
           "range": {
              "company.employees": { // I WANT THE LAST ITEM FROM THIS ARRAY
                 "gte": "'. $min . '",
                 "lt" : "'.$max .'"
               }
            }
          }
        ]
    }
}

This basically works, but of course, doesn't give me the last record of the employees array.

How can I solve this? Please help...

UPDATE

ok so now I added the code which was suggested:

  "query": {
      "bool": {
        "should" : [
          { "match" : { "company.status" : "normal" },
          {
           "range": {
              "company.employees": { // I WANT THE LAST ITEM FROM THIS ARRAY
                 "gte": "'. $min . '",
                 "lt" : "'.$max .'"
               }
            }
          }
        ]
      },
      "script": {
           "source": """        
                def period_keys = new ArrayList(ctx._source.company.employees.keySet());
                Collections.sort(period_keys);
                Collections.reverse(period_keys);
                
                def latest_period = period_keys[0];
                def latest_amount = ctx._source.company.employees[latest_period].amount;
                
                ctx._source.company.current_employees = ["period": latest_period, "amount": latest_amount];
                """
            }
        }
    }

But I get the error: Unexpected character ('{' (code 123)): was expecting comma to separate Object entries...

Since I'm still learning I must say, I have no clue what is going on and error messaging from Elasticsearch is horrible.

Anyway, does anyone have a clue? Thanks in advance

1 Answer 1

0

Looking up something like this at runtime is quite difficult and under-optimized. Here's an alternative.

I'm assuming a given company's employee counts don't change that often -- meaning when they do change (i.e. you update that document), you can run the following _update_by_query script to get the latest period's employee info and save it on the company level while leaving the employee section untouched:

POST companies_index/_update_by_query
{
  "query": {
    "match_all": {}
  },
  "script": {
    "source": """        
      def period_keys = new ArrayList(ctx._source.company.employees.keySet());
      Collections.sort(period_keys);
      Collections.reverse(period_keys);
      
      def latest_period = period_keys[0];
      def latest_amount = ctx._source.company.employees[latest_period].amount;
      
      ctx._source.company.current_employees = ['period': latest_period, 'amount': latest_amount];
    """
  }
}

One-liner:

POST companies_index/_update_by_query
{"query":{"match_all":{}},"script":{"source":"      def period_keys = new ArrayList(ctx._source.company.employees.keySet());\n      Collections.sort(period_keys);\n      Collections.reverse(period_keys);\n      \n      def latest_period = period_keys[0];\n      def latest_amount = ctx._source.company.employees[latest_period].amount;\n      \n      ctx._source.company.current_employees = ['period': latest_period, 'amount': latest_amount];"}}

Note that when the above query is empty, the script will apply to all docs in your index. But of course you could limit it to one company only.

After that call your documents will look like this:

{
  "company" : {
    "company_number" : "1234567",
    "name" : "some company",
    "current_employees" : {        <---
      "period" : "period_3",
      "amount" : 24
    },
    "employees" : {
      ...
    },
    ...
  }
}

and the range query from above becomes a piece of cake:

  ...
  "range": {
    "company.current_employees.amount": {     <--
       "gte": "'. $min . '",
       "lt" : "'.$max .'"
     }
  ...

BTW I also assumed that the period keys can be sorted alphabetically but if they contain dates, the script will require an adjustment in the form of a date parsing comparator.

Sign up to request clarification or add additional context in comments.

10 Comments

Hmm, the script-thing returns Unexpected character ('"' (code 34)): was expecting comma to separate Object entries... :-/
Update my answer -- the triple quotes can only be used in kibana; otherwise need to be escaped...
Weird. Are you sure you've run the one liner?
Yes, and also "script" should be inside the "query"...
No because it's _update_by_query, not _search.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.