0

First of all, I'm very new to Elasticsearch. I'm using the python library to run queries.

I have documents with lists embedded inside other lists, for example:

{"vendors": [{'id': 22603,
  'name': 'Facebook',
  'products': [{'id': 4469256,
    'name': 'osquery',
    'versions': [{'id': 169014,
      'name': '3.2.7',
      'affected': False,},
     {'id': 44084,
      'name': '3.2.6',
      'affected': True}]}]}]}
} 

For context, this is a vulnerabilities database.

A vendor can have multiple products and each product, besides the name, different versions.

Each version has a name and a flag affected.

What I need to get is: get me all the documents, where the product name is xxx, version is yyy and affected is zzz.

For example: product name is osquery, version is 3.2.7 and affected is True.

One of the many ways I've sent the query (with no success) is:

{'query': {'bool': {"must": [{"term": {"vendors.products.versions.affected": True}}, 
                             {'term': {'vendors.products.versions.name': "3.2.7"}}, 
                             {"term": {"vendors.products.name": "osquery"} } ] } }} 

The problem is that this query is returning me the document I posted above, even though version 3.2.7 has affected = False.

So it seems its doing an OR instead of and AND inside the elements of the versions list, since it finds a version that matches, and another, different version, with a matching affected value, it returns the document, but is not the expected result.

Is there any way to force it to use the AND? I've tried the default_operator parameter in different queries, but that seems to work only for query_string queries. Or, is there a best way to query for elements inside lists?

2 Answers 2

1

As @Joe have pointed it out correctly that multiple levels of nestedness may render your queries quite verbose. Nested can make queries several times slower

But still, if you want to query the data(in the same format), then you need to make vendors, parent as well as versions of nested type.

versions have to be of nested type because if you consider that as arrays of objects you cannot query each object independently of the other objects in the array.

Adding a working example with index mapping,search query, and search result

Index Mapping:

{
  "mappings": {
    "properties": {
      "vendors": {
        "type": "nested",
        "properties": {
          "products": {
            "type": "nested",
            "properties": {
              "versions": {
                "type": "nested"
              }
            }
          }
        }
      }
    }
  }
}

Search Query:

{
  "query": {
    "bool": {
      "must": [
        {
          "nested": {
            "path": "vendors.products",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "vendors.products.name": "osquery"
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "vendors.products.versions",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "vendors.products.versions.affected": "True"
                    }
                  },
                  {
                    "match": {
                      "vendors.products.versions.name": "3.2.7"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

Search Result:

"hits": []
Sign up to request clarification or add additional context in comments.

1 Comment

wow! that's a verbose query indeed! but it works and that's what matters. Thanks!
1

What you're seeing is a direct consequence of array flattening as described in this answer. If you're looking for a simple solution, simply apply the nested mapping, reindex, and your bool-must query will work 'correctly.'


I'd recommend converting at least products to the nested data type; perhaps even the parent, vendors. Bear in mind, though, that multiple levels of nestedness may render your queries quite verbose and you may find yourself reversing the nestedness when trying to determine top-level counts so it's worthwhile to consider whether the index's basic building block can perhaps be a product whose vendor will be an attribute -- instead of listing multiple products under a single vendor.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.