3

Given the following index, how would I select proper item in the nested array and access one of it's values? The purpose here is to use it inside the value inside a script_score.

# Create mapping
curl -XPUT localhost:9200/test/user/_mapping -d '
{
  "user" : {
    "properties" : {
      "name" : {
        "type" : "string"
      },
      "skills" : {
        "type": "nested", 
        "properties" : {
          "skill_id" : {
            "type" : "integer"
          },
          "recommendations_count" : {
            "type" : "integer"
          }
        }
      }
    }
  }
}
'

# Indexing Data
curl -XPUT localhost:9200/test/user/1 -d '
{
   "name": "John",
   "skills": [
      {
         "skill_id": 100,
         "recommendations_count": 5
      },
      {
         "skill_id": 200,
         "recommendations_count": 3
      }
   ]
}
'

curl -XPUT localhost:9200/test/user/2 -d '
{
   "name": "Mary",
   "skills": [
      {
         "skill_id": 100,
         "recommendations_count": 9
      },
      {
         "skill_id": 200,
         "recommendations_count": 0
      }
   ]
}
'

My query filters by skill_id and this works well. I then want to be able to use script_score to boost the score of the user documents with a higher recommendations_count for the given skill_id. (<-- this is key).

curl -XPOST localhost:9200/test/user/_search -d '
{      
    "query":{
      "function_score":{
        "query":{
          "bool":{
            "must":{
              "nested":{
                "path":"skills",
                "query":{
                  "bool":{
                    "must":{
                      "term":{
                        "skill_id":100
                      }
                    }
                  }
                }
              }
            }
          }
        },
        "functions":[
          {
            "script_score": {
               "script": "sqrt(1.2 * doc['skills.recommendations_count'].value)"   
            }
          }            
        ]
      }
    }
  }
} 
'

How do I access the skills array from within the script, find the 'skill_id: 100' item in the array, and then use its recommendations_count value? The script_score above doesn't currently work (score is always 0 regardless of the data, so I assume doc['skills.recommendations_count'].value is not looking in the right place.

1 Answer 1

6

For your specific question, the script needs the nested context, just like you did with the term query.

This can be rewritten for ES 1.x:

curl -XGET 'localhost:9200/test/_search' -d'
{
  "query": {
    "nested": {
      "path": "skills",
      "query": {
        "filtered": {
          "filter": {
            "term": {
              "skills.skill_id": 100
            }
          },
          "query": {
            "function_score": {
              "functions": [
                {
                  "script_score": {
                    "script": "sqrt(1.2 * doc['skills.recommendations_count'].value)"
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}'

For ES 2.x (filters became first-class citizens in ES 2.x, so the syntax changed a bit to catch up!):

curl -XGET 'localhost:9200/test/_search' -d'
{
  "query": {
    "nested": {
      "path": "skills",
      "query": {
        "bool": {
          "filter": {
            "term": {
              "skills.skill_id": 100
            }
          },
          "must": {
            "function_score": {
              "functions": [
                {
                  "script_score": {
                    "script": "sqrt(1.2 * doc['skills.recommendations_count'].value)"
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}'

Note: I made the term query a term filter because it has no logical impact on the score (it's either an exact match or not). I also added the nested field's name to the term filter, which is a requirement in Elasticsearch 2.x and later (and good practice earlier).

With that out of the way, you can (and should) avoid using a script whenever possible. This is one of those cases. function_score supports the concept of a field_value_factor function that lets you do things exactly like you are trying, but entirely without a script. You can also optionally supply a "missing" value to control what happens if the field is missing.

This translates to exactly the same script, but it will perform better:

curl -XGET 'localhost:9200/test/_search' -d'
{
  "query": {
    "nested": {
      "path": "skills",
      "query": {
        "filtered": {
          "filter": {
            "term": {
              "skills.skill_id": 100
            }
          },
          "query": {
            "function_score": {
              "functions": [
                {
                  "field_value_factor": {
                    "field": "skills.recommendations_count",
                    "factor": 1.2,
                    "modifier": "sqrt",
                    "missing": 0
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}'

For ES 2.x:

curl -XGET 'localhost:9200/test/_search' -d'
{
  "query": {
    "nested": {
      "path": "skills",
      "query": {
        "bool": {
          "filter": {
            "term": {
              "skills.skill_id": 100
            }
          },
          "must": {
            "function_score": {
              "functions": [
                {
                  "field_value_factor": {
                    "field": "skills.recommendations_count",
                    "factor": 1.2,
                    "modifier": "sqrt",
                    "missing": 0
                  }
                }
              ]
            }
          }
        }
      }
    }
  }
}'

Scripts are slow and they also imply the use of fielddata in Elasticsearch 1.x, which is bad. You did mention doc values, which is a promising start that suggests that using Elasticsearch 2.x, but that may have just been terminology.

If you're just starting with Elasticsearch, then I strongly recommend starting with the latest version.

Sign up to request clarification or add additional context in comments.

5 Comments

Wow, I appreciate the throughout response, thank you. It seems as the filter at the bottom doesn't filter based on skill_id: 100.
Be sure to use "skills.skill_id" rather than just "skill_id".
I am, and if I remove the inner query{} block with the function inside the filtering does work.
Can you update your question with the modified request that's not fully working? I tried it with the example data in both ES 1.7.3 and ES 2.0.0.
You were definitely right that the filter was ignored in the first version of this. I edited the answer to show the proper way to do it. Apparently the nested logic accepts query or filter, but it prefers query if it gets both! Fixed to show the proper way.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.