22

I am trying to create a script using the script_score of the function_score. I have several documents whose rankings field is type="nested". The mapping for the field is:

"rankings": {
        "type": "nested",
        "properties": {
          "rank1": {
            "type": "long"
          },
          "rank2": {
            "type": "float"
          },
          "subject": {
            "type": "text"
          }
        }
      }

A sample document is:

"rankings": [
{
    "rank1": 1051,
    "rank2": 78.5,
    "subject": "s1"
},
{
    "rank1": 45,
    "rank2": 34.7,
    "subject": "s2"
}]

What I want to achieve is to iterate over the nested objects of rankings. Actually, I need to use i.e. a for loop in order to find a particular subject and use the rank1, rank2 to compute something. So far, I use something like this but it does not seem to work (throwing a Compile error):

"function_score": {
"script_score": {
    "script": {
        "lang": "painless",
        "inline": 
                 "sum = 0;"
                 "for (item in doc['rankings_cug']) {"
                     "sum = sum + doc['rankings_cug.rank1'].value;"
                 "}"
         }
    }
}

I have also tried the following options:

  1. for loop using : instead of in: for (item:doc['rankings']) with no success.
  2. for loop using in but trying to iterate over a specific element of the object, i.e. the rank1: for (item in doc['rankings.rank1'].values), which actually compile but it seems that it finds a zero-length array of rank1.

I have read that _source element is the one which can return JSON-like objects, but as far as I found out it is not supported in Search queries.

Can you please give me some ideas of how to proceed with that?

Thanks a lot.

3 Answers 3

28

You can access _source via params._source. This one will work:

PUT /rankings/result/1?refresh
{
  "rankings": [
    {
      "rank1": 1051,
      "rank2": 78.5,
      "subject": "s1"
    },
    {
      "rank1": 45,
      "rank2": 34.7,
      "subject": "s2"
    }
  ]
}

POST rankings/_search

POST rankings/_search
{
  "query": {
    "match": {
      "_id": "1"
    }
  },
  "script_fields": {
    "script_score": {
      "script": {
        "lang": "painless",
        "inline": "double sum = 0.0; for (item in params._source.rankings) { sum += item.rank2; } return sum;"
      }
    }
  }
}

DELETE rankings
Sign up to request clarification or add additional context in comments.

3 Comments

holy shiit, thanks. Why is it params instead of ctx now ?
@tricky params is used to iterate over the params we provide the script. ctx is used to iterate over the existing data. You can access it with for ( item in ctx._source.rankings )
@tricky, You want to iterate over rankings field which is not a simple valued field. It is a JSON array. ElasticSearch does not caches non-simple fields in memory in doc field. More explanation at: github.com/elastic/elasticsearch/issues/… and elastic.co/guide/en/elasticsearch/reference/master/…
10

Unfortunately, ElasticSearch scripting in general does not support the ability to access nested documents in this way (including Painless). Perhaps, consider a different structure to your mappings where rankings are stored in multi-valued fields if you need to be able to iterate across them in such a way. Ultimately, the nested data will need to de-normalized and put into the parent documents to be able to gets scores in the way described here.

3 Comments

Thanks for the answer. Yes, I actually changed the way to represent my data. I used each "subject" as the key of my inner object and accessed each object using the key.
@jdconrad is it supported on nested field types on ES 6.7?
@christinabo I am facing a similar problem. How did you achieve using the 'subject' as the key and then accessing objects using that key?
7

For Nested objects in an array, iterated over the items and it worked. Following is my sample data in elasticsearch index:

{
  "_index": "activity_index",
  "_type": "log",
  "_id": "AVjx0UTvgHp45Y_tQP6z",
  "_version": 4,
  "found": true,
  "_source": {
    "updated": "2016-12-11T22:56:13.548641",
    "task_log": [
      {
        "week_end_date": "2016-12-11",
        "log_hours": 16,
        "week_start_date": "2016-12-05"
      },
      {
        "week_start_date": "2016-03-21",
        "log_hours": 0,
        "week_end_date": "2016-03-27"
      },
      {
        "week_start_date": "2016-04-24",
        "log_hours": 0,
        "week_end_date": "2016-04-30"
      }
    ],
    "created": "2016-12-11T22:56:13.548635",
    "userid": 895,
    "misc": {

    },
    "current": false,
    "taskid": 1023829
  }
}

Here is the "Painless" script to iterate over nested objects:

{
  "script": {
    "lang": "painless",
    "inline": 
        "boolean contains(def x, def y) {
          for (item in x) {
            if (item['week_start_date'] == y){
              return true
            }
          }
          return false 
         }
         if(!contains(ctx._source.task_log, params.start_time_param) {
           ctx._source.task_log.add(params.week_object)
         }",
         "params": {
            "start_time_param": "2016-04-24",
             "week_object": {
               "week_start_date": "2016-04-24",
               "week_end_date": "2016-04-30",
               "log_hours": 0
              }
          }
  }
}

Used above script for update: /activity_index/log/AVjx0UTvgHp45Y_tQP6z/_update In the script, created a function called 'contains' with two arguments. Called the function. The old groovy style: ctx._source.task_log.contains() will not work since ES 5.X stores nested objects in a separate document. Hope this helps!`

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.