163

I'm curious as to the pros and cons of using subdocuments vs a deeper layer in my main schema:

var subDoc = new Schema({
  name: String
});

var mainDoc = new Schema({
  names: [subDoc]
});

or

var mainDoc = new Schema({
  names: [{
    name: String
 }]
});

I'm currently using subdocs everywhere but I am wondering primarily about performance or querying issues I might encounter.

4
  • 1
    I was trying to type in a answer this to you, but I couldn't find how. But give a look at here: mongoosejs.com/docs/subdocs.html Commented Mar 4, 2013 at 20:26
  • Here is a good response about MongoDB considerations to ask yourself when creating your database schema: stackoverflow.com/questions/5373198/… Commented Mar 4, 2013 at 20:41
  • You meant that it's required to also describe the _id field? I mean, it's not kinda automatic if it's enabled? Commented Feb 8, 2014 at 19:02
  • anyone know if the _id field of subdocuments are unique? (created using 2nd way in OP's question) Commented Jan 16, 2018 at 15:39

6 Answers 6

93

According to the docs, it's exactly the same. However, using a Schema would add an _id field as well (as long as you don't have that disabled), and presumably uses some more resources for tracking subdocs.

Alternate declaration syntax

New in v3 If you don't need access to the sub-document schema instance, you may also declare sub-docs by simply passing an object literal [...]

Sign up to request clarification or add additional context in comments.

5 Comments

But I tried this. Why the sub documents data is not stored in separate collection. It always store inside the mainDoc collection.
that's how sub documents work. they are embedding inside of a document. before playing with mongoose, make sure you understand the underlying MongoDB.
Regarding the Schema adding _id, that makes sense but I created a schema with an array of sub-docs and an array of object literals and an _id was added to both. Has the behavior changed?
@DrewGoodwin seems like it's been like this for a while: stackoverflow.com/questions/17254008/…
@DrewGoodwin yes, mongoose automatically created a schema for object literals declared within an array. mongoosejs.com/docs/subdocs.html#altsyntaxarrays
50

If you have schemas that are re-used in various parts of your model, then it might be useful to define individual schemas for the child docs so you don't have to duplicate yourself.

3 Comments

This is an great answer. Sometimes I use subdocuments in more that one model, or I have two fields in a model that needs to be distinguished, but still have the same subdocument structure.
you should also consider the benefits/disadvantages of saving redundant information.
object structure can do the same
34

You should use embedded documents if that are static documents or that are not more than a few hundred because of performance impact. I have gone through about that issue for a while ago. Newly, Asya Kamsky who works as a solutions architect for MongoDB had written an article about "using subdocuments".

I hope that helps to who is looking for solutions or the best practice.

Original post on http://askasya.com/post/largeembeddedarrays . You can reach her stackoverflow profile on https://stackoverflow.com/users/431012/asya-kamsky

First of all, we have to consider why we would want to do such a thing. Normally, I would advise people to embed things that they always want to get back when they are fetching this document. The flip side of this is that you don't want to embed things in the document that you don't want to get back with it.

If you embed activity I perform into the document, it'll work great at first because all of my activity is right there and with a single read you can get back everything you might want to show me: "you recently clicked on this and here are your last two comments" but what happens after six months go by and I don't care about things I did a long time ago and you don't want to show them to me unless I specifically go to look for some old activity?

First, you'll end up returning bigger and bigger document and caring about smaller and smaller portion of it. But you can use projection to only return some of the array, the real pain is that the document on disk will get bigger and it will still all be read even if you're only going to return part of it to the end user, but since my activity is not going to stop as long as I'm active, the document will continue growing and growing.

The most obvious problem with this is eventually you'll hit the 16MB document limit, but that's not at all what you should be concerned about. A document that continuously grows will incur higher and higher cost every time it has to get relocated on disk, and even if you take steps to mitigate the effects of fragmentation, your writes will overall be unnecessarily long, impacting overall performance of your entire application.

There is one more thing that you can do that will completely kill your application's performance and that's to index this ever-increasing array. What that means is that every single time the document with this array is relocated, the number of index entries that need to be updated is directly proportional to the number of indexed values in that document, and the bigger the array, the larger that number will be.

I don't want this to scare you from using arrays when they are a good fit for the data model - they are a powerful feature of the document database data model, but like all powerful tools, it needs to be used in the right circumstances and it should be used with care.

3 Comments

This should be the top answer; it's bang on the money. MongoDB's own white papers say pretty much the same thing.
This article about the Bucket Pattern compliments what Asya talks about nicely. mongodb.com/blog/post/building-with-patterns-the-bucket-pattern I think the subDoc schema in OP's question would work well with the Bucket Pattern.
A few hundred whats?
23

Basically, create a variable nestedDov and put it here name: [nestedDov]

Simple Version:

var nestedDoc = new Schema({
  name: String
});

var mainDoc = new Schema({
  names: [nestedDoc]
});

JSON Example

{
    "_id" : ObjectId("57c88bf5818e70007dc72e85"),
    "name" : "Corinthia Hotel Budapest",
    "stars" : 5,
    "description" : "The 5-star Corinthia Hotel Budapest on the Grand Boulevard offers free access to its Royal Spa",
    "photos" : [
        "/photos/hotel/corinthiahotelbudapest/1.jpg",
        "/photos/hotel/corinthiahotelbudapest/2.jpg"
    ],
    "currency" : "HUF",
    "rooms" : [
        {
            "type" : "Superior Double or Twin Room",
            "number" : 20,
            "description" : "These are some great rooms",
            "photos" : [
                "/photos/room/corinthiahotelbudapest/2.jpg",
                "/photos/room/corinthiahotelbudapest/5.jpg"
            ],
            "price" : 73000
        },
        {
            "type" : "Deluxe Double Room",
            "number" : 50,
            "description" : "These are amazing rooms",
            "photos" : [
                "/photos/room/corinthiahotelbudapest/4.jpg",
                "/photos/room/corinthiahotelbudapest/6.jpg"
            ],
            "price" : 92000
        },
        {
            "type" : "Executive Double Room",
            "number" : 25,
            "description" : "These are amazing rooms",
            "photos" : [
                "/photos/room/corinthiahotelbudapest/4.jpg",
                "/photos/room/corinthiahotelbudapest/6.jpg"
            ],
            "price" : 112000
        }
    ],
    "reviews" : [
        {
            "name" : "Tamas",
            "id" : "/user/tamas.json",
            "review" : "Great hotel",
            "rating" : 4
        }
    ],
    "services" : [
        "Room service",
        "Airport shuttle (surcharge)",
        "24-hour front desk",
        "Currency exchange",
        "Tour desk"
    ]
}

Example:

enter image description here

6 Comments

That doesn't address the question at all which is one of performance.
I have edited a bit in order to make more sense. What do you think?
The question is not asking how to do nested schemas. Its a discussion on whether Mongoose is more performant with nested schemas or embedded sub documents. Basically we are talking benchmarks or sorts or edge cases where Mongoose prefers one to the other. And as the selected answer mentions it doesn't appear to make any difference, at least from V3 on.
Maybe doesn't work for the OP, but I found this very helpful. Thanks.
This is good when all 3 schemas are declared in one .js file, how can we handle it when declared in 3 different .js files?
|
10

I think this is handled elsewhere by multiple post on SO.

Just a few:

The big key is that there is no single answer here, only a set of rather complex trade-offs.

1 Comment

Perhaps I am not phrasing my question correctly - This is not a question of how I should structure my database but rather the internals of using a subschema vs just writing the array in a deeper layer. My primary cause for using a subschema is that I can make use of custom schema types and have them validate - something that doesn't work with nested arrays (from a previous question I had on SO). As near as I can tell a subdoc is pretty much the same as a nested array - I just don't know the internals of it - if using them would create performance issues or such.
3

There are some difference between the two:

  • Using nested schema is helpful for validation.

  • Nested schema can be reused in other schemas.

  • Nested schema add '_id' field to the subdocument unless you used "_id:false"

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.