5

Suppose I have a MongoDB where I have separate texts which consist of statements.

I need to be able to search for the texts, which have certain keywords in statements (also multiple texts that have an occurrence of a search term).

I also need to be able to find all the statements in all the texts added by a particular user, which contain a particular search phrase.

My question: do I need to create a separate collection for statements or can I simply add them as nested into the texts collection?

So, option 1 (separate collections):

Texts collection


text: {
    name: 'nabokov',
    id: '1'
}

Statements collection:

statement: {
    text_id: '1',
    id: '24',
    text: 'He opened the window and saw the sky`
}

Option 2 (nested):


text: {
    name: 'nabokov',
    id: '1'
    statements: [
        id: '24',
        text: 'He opened the window and saw the sky`
    ]
}

Which MongoDB storage schema is better if I want to retrieve statements separately based on keyword search and retain the contextual data (e.g. which text they belong to etc.)

How would this affect the write / read speed for larger DBs (e.g. > 100 Gb).

My texts would be limited to 16 Mb.

3
  • 1
    how big(how many members) is expected to be the statements array in option2? this is important factor to decide. big documents => slower updates(load in memory) => problem if you update them often Commented Oct 1, 2021 at 15:06
  • @Takis_ there could be tens of thousands of statements there. Is i faster you think to write them into a separate collection rather than inside a document? Commented Oct 4, 2021 at 19:45
  • i think so, but its better to ask more experienced people also, if you want ask here also giving them all information possible Commented Oct 4, 2021 at 21:49

1 Answer 1

3

For MongoDB document schema design w.r.t. performance, there are several factors that could be helpful to take into consideration:

  1. What are the cardinalities of the relationships between collections?
  2. What is the expected number/size of documents in a collection?
  3. What are the most frequently used queries?
  4. how often are documents getting updated?

For your scenario, we actually need more context / details from you to work out a more sensible "answer". But here are some common scenarios that I have personally come into before and it might be useful for you as a reference.

  1. text as a root document that is not frequently updated; Most of the queries are based on the statement collection as a child collection.

In this case, it could be a good idea to denormalize the text document and replicating the field name into corresponding statement document. e.g.

statement: {
    text_id: '1',
    text_name: 'nabokov',
    id: '24',
    text: 'He opened the window and saw the sky`
}

In this way, you gain performance boost by avoiding a $lookup to the text collection, while only incurring a small cost of maintaining the new text_name column. The cost is small since the text document is not going to be updated frequently anyway.

  1. a text document will be associated with small number of statements objects/documents only.

In this case, it could be a good idea to go for your option 1 (i.e. keep the statements in an array of text document). The advantage is you can compose rather simple queries and avoid the cost in maintaining another collection of statement.

Here is a very good document to read more about MongoDB schema design.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.