0

I have an instance in which articles are stored which have various properties. But it may be that some items have no properties at all. There are countless properties and assigned values, all in random order.

Now the problem is that unfortunately it doesn't work the way I would like it to. The properties are respected, but it seems like the order of the properties is important. But it can be that there are a lot of properties in the entry in the instance and only 1-2 are queried in the search query and these can have minor deviations in the value.

The goal is to find entries that are as similar as possible, no matter the order of the properties.

Can anyone help me with this?

Elastic instance info:

 "_index" : "articles",
    "_type" : "_doc",
    "_id" : "fYjaQXkBBdCju4scstN_",
    "_score" : 1.0,
    "_source" : {
      "position" : "400.000",
      "beschreibung" : "asc",
      "menge" : 24.0,
      "einheit" : "St",
      "properties" : [
        {
          "desc" : "Farbe",
          "val" : "rot"
        },
        {
          "desc" : "Material",
          "val" : "Holz"
        },
        {
          "desc" : "Länge",
          "val" : "20 cm"
        },
        {
          "desc" : "Breite",
          "val" : "100 km"
        }
      ]
    }
  }

The nested part of my current query:

[nested] => Array
(
    [path] => properties
    [query] => Array
        (
            [0] => Array
                (
                    [0] => Array
                        (
                            [bool] => Array
                                (
                                    [should] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [match] => Array
                                                        (
                                                            [properties.desc] => Farbe
                                                        )

                                                )

                                            [1] => Array
                                                (
                                                    [match] => Array
                                                        (
                                                            [properties.val] => rot
                                                        )

                                                )

                                        )

                                )

                        )

                )

            [1] => Array
                (
                    [0] => Array
                        (
                            [bool] => Array
                                (
                                    [should] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [match] => Array
                                                        (
                                                            [properties.desc] => Länge
                                                        )

                                                )

                                            [1] => Array
                                                (
                                                    [match] => Array
                                                        (
                                                            [properties.val] => 22 cm
                                                        )

                                                )

                                        )

                                )

                        )

                )

            [2] => Array
                (
                    [0] => Array
                        (
                            [bool] => Array
                                (
                                    [should] => Array
                                        (
                                            [0] => Array
                                                (
                                                    [match] => Array
                                                        (
                                                            [properties.desc] => Material
                                                        )

                                                )

                                            [1] => Array
                                                (
                                                    [match] => Array
                                                        (
                                                            [properties.val] => Holz
                                                        )

                                                )

                                        )

                                )

                        )

                )

        )

)
4
  • 1
    Which field type are you using for "properties"? Commented May 6, 2021 at 15:53
  • The "properties" field has the type "nested" Commented May 7, 2021 at 5:39
  • What I don't understand is why it doesn't make any difference if I change my query so that I don't search for "Länge" "22 cm" but for "Länge" "20 cm", i.e. search exactly for the values that are also contained in the entry. So, it makes no difference for the score value if I search exactly for a property and its value, which is also contained in the entry, or I change the value of the search property a little bit. The score result remains exactly the same. Commented May 7, 2021 at 6:13
  • 1
    In my opinion the query you posted is hard to read, maybe you could edit the question and post it in the JSON format. That makes it way easier for others to reproduce your errors :) Commented May 9, 2021 at 12:27

1 Answer 1

1

There are two problems in your query leading to strange results:

  1. You're using a match query on a text field, which has multiple terms. So when doing a

    "match": {
      "properties.val": "22 cm",
    }
    

    , Elasticsearch searches for "22" OR "cm" in the properties.val field. I assume you wanna match on the whole phrase, so you could for example use match_phrase here. Alternatively, you could put the unit into an own field. Another option would be to use the operator parameter:

    "match": {
      "properties.val": {
        "query": "20 cm",
        "operator": "and"
      }
    }
    

    But be aware that this isn't looking for exact phrase. For example "20 30 cm" would also be matched, but maybe this could suit your case.

  2. You're using the should clause on the property level. So you're basically asking for documents with properties, that "should have Farbe in their description and rot in their value", but that would match all following examples:

    "properties" : [
       {
         "desc" : "Farbe",
         "val" : "blau"
       }
      ]
    

   "properties" : [
      {
        "desc" : "Material",
        "val" : "rot"
      }
     ]

   "properties" : [
      {
        "desc" : "Farbe",
        "val" : "blau"
      },
      {
        "desc" : "Material",
        "val" : "rot"
      }
     ]

So you need a bool query (having must or filter clauses) for each property you wanna match and a bool query around that having a should clause for each property. Your query from the question could then be like this:

{
  "query": {
    "bool": {
      "should": [
        {
          "nested": {
            "path": "properties",
            "query": {
              "bool": {
                "filter": [
                  {
                    "match": {
                      "properties.desc": "Farbe"
                    }
                  }
                ],
                "must": [
                  {
                    "match_phrase": {
                      "properties.val": "rot"
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "properties",
            "query": {
              "bool": {
                "filter": [
                  {
                    "match": {
                      "properties.desc": "Länge"
                    }
                  }
                ],
                "must": [
                  {
                    "match_phrase": {
                      "properties.val": "22 cm"
                    }
                  }
                ]
              }
            }
          }
        },
        {
          "nested": {
            "path": "properties",
            "query": {
              "bool": {
                "filter": [
                  {
                    "match": {
                      "properties.desc": "Material"
                    }
                  }
                ],
                "must": [
                  {
                    "match_phrase": {
                      "properties.val": "Holz"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

Please try, if this gives the desired results. You could still tweak the query for example by using minimum_should_match or defining the score given by each matched property.

Sign up to request clarification or add additional context in comments.

3 Comments

Hi @Chules and thank you for this helpful answer! I will check your points, i'm sure it will help me a lot.
Hey @Maisen1886, I'm glad I could help. In case this solved your problem, please consider marking it as an accepted answer :)
Oh yeah sure! 8-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.