0

With the sample json shown below, am trying to retrieve all documents that contains atleast one category which is array object wrapped underneath Categories that has the text value 'drinks' with the following query but the returned result is empty. Can someone help me get this right?

SELECT items.id
,items.description
,items.Categories
FROM items
WHERE ARRAY_CONTAINS(items.Categories.Category.Text, "drink")

{
  "id": "1dbaf1d0-6549-11a0-88a8-001256957023",
  "Categories": {
    "Category": [{
      "Type": "GS1",
      "Id": "10000266",
      "Text": "Stimulants/Energy Drinks Ready to Drink"
    }, {
      "Type": "GS2",
      "Id": "10000266",
      "Text": "Healthy Drink"
    }]
  }
},
Note: The json is a bit wierd to have the array wrapped by an object itself - this json was converted from a XML hence the result. So please assume I do not have any control over how this object is saved as json

2 Answers 2

3

You need to flatten the document in your query to get the result you want by joining the array back to the main document. The query you want would look like this:

SELECT items.id, items.Categories
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")

However, because there is no concept of a DISTINCT query, this will produce duplicates equal to the number of Category items that contain the word "drink". So this query would produce your example document twice like this:

[
  {
    "id": "1dbaf1d0-6549-11a0-88a8-001256957023",
    "Categories": {
      "Category": [
        {
          "Type": "GS1",
          "Id": "10000266",
          "Text": "Stimulants/Energy Drinks Ready to Drink"
        },
        {
          "Type": "GS2",
          "Id": "10000266",
          "Text": "Healthy Drink"
        }
      ]
    }
  },
  {
    "id": "1dbaf1d0-6549-11a0-88a8-001256957023",
    "Categories": {
      "Category": [
        {
          "Type": "GS1",
          "Id": "10000266",
          "Text": "Stimulants/Energy Drinks Ready to Drink"
        },
        {
          "Type": "GS2",
          "Id": "10000266",
          "Text": "Healthy Drink"
        }
      ]
    }
  }
]

This could be problematic and expensive if the Categories array holds a lot of Category items that have "drink" in them.

You can cut that down if you are only interested in a single Category by changing the query to:

SELECT items.id, Category
FROM items
JOIN Category IN items.Categories.Category
WHERE CONTAINS(LOWER(Category.Text), "drink")

Which would produce a more concise result with only the id field repeated with each matching Category item showing up once:

  [{
    "id": "1dbaf1d0-6549-11a0-88a8-001256957023",
    "Category": {
      "Type": "GS1",
      "Id": "10000266",
      "Text": "Stimulants/Energy Drinks Ready to Drink"
    }
  },
  {
    "id": "1dbaf1d0-6549-11a0-88a8-001256957023",
    "Category": {
      "Type": "GS2",
      "Id": "10000266",
      "Text": "Healthy Drink"
    }
  }]

Otherwise, you will have to filter the results when you get them back from the query to remove duplicate documents.

Sign up to request clarification or add additional context in comments.

2 Comments

Unless I'm mistaken, the use of LOWER makes it such that this will always result in a full table scan, correct? It would be interesting to see if this or the UDF approach took more RUs. To be fair, you'd also have to factor in the de-duplication to the comparison.
Thank you! The query did work but as stated it resulted in duplicates and amounts to additional work of de duping and also certain items contain about 5-6 categories and having that multiplied by the number of items retrieved is not viable, but thank you again for giving me a working query :) it does help me writing better queries and understand the same better! - Upvoted!
2

If it were me and I was building a production system with this requirement, I'd use Azure Search. Here is some info on hooking it up to DocumentDB.

If you don't want to do that and we must live with the constraint that you can't change the shape of the documents, the only way I can think to do this is to use a User Defined Function (UDF) like this:

function GetItemsWithMatchingCategories(categories, matchingString) {
  if (Array.isArray(categories) && categories !== null) {
    var lowerMatchingString = matchingString.toLowerCase();
    for (var index = 0; index < categories.length; index++) {
      var category = categories[index];
      var categoryName = category.Text.toLowerCase();
      if (categoryName.indexOf(lowerMatchingString) >= 0) {
        return true;
      }
    }
  }
}

Note, the code above was modified by the asker after actually trying it out so it's somewhat tested.

You would use it with a query like this:

SELECT * FROM items WHERE udf.GetItemsWithMatchingCategories(items.Categories, "drink")

Also, note that this will result in a full table scan (unless you can combine it with other criteria that can use an index) which may or may not meet your performance/RU limit constraints.

3 Comments

Thank you :) . UDF indeed was the right and faster solution. However one additional question (I can move this to a diff post) -but just in case do you know - how do I test (unit and integration) scripts such as these. Have a little experience in jasmine, but was wondering if azure sdk scripts has any. thanks again! I marked it as the answer. Will modify it a bit to match my attributes.
UDFs are easily tested with standard JavaScript testing tools on NodeJS since they are just JavaScript functions. I use NodeUnit but Jasmine is a common choice. For testing stored procedures (sprocs) on NodeJS, I created documentdb-mock.
Thanks for editing the code after you got it working. I feel bad posting untested code, but didn't have time to try it out myself. Hopefully, it wasn't too hard to figure it out from my hand-written example. :-)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.