4

I have a few question about MongoID and embedded relations in Rails.

In my model I have:

class Board
  include Mongoid::Document

  attr_accessible :title

  field :title, :type => String

  #has_and_belongs_to_many :users
  embeds_many :items
end

when I call

Board.all

it returns the whole collection, including also :items.

I've read in many articles/forums that using MongoDB embedded relations should be preferred over referenced ones but I have some questions:

  • What about performaces? each time i want to retrieve a board i'll also retrieve items inside it: it may be useful sometimes but in the case i want only board's information and not items inside it I should create a new method for not retrieving items.
  • When I want to update an item the DB will reload the whole document and not only the item I want to retrive, right?

Up to now I've noticed that the only advantage in using embedded document is for something like what in SQL are called "joins" but I also see a lot of performaces problem, there are important reason to use embedded relations over referenced relations?

EDIT

As pointed out by Adam C my thoughts are releated to situations like these:

as explained before I will have Boards each one with many Items inside it and using Rails scaffolding it generates methods that retrieve the whole Board document from the database but many times (for example when editing a Board) i want to load the document without the Items part.

Since I will be using mostly JSON calls my idea was to add an optional parameter to the url like "?get_items" to be set to TRUE in case I want also to get items, in other situations I would use Mongoid's:

Model.without

For example let's take the index action:

  def index
    @boards = Board.all

    respond_to do |format|
      format.html # index.html.erb
      format.json { render json: @boards }
    end
  end

I'll need to get only fields specified in Board Model (in that case only :title) without items so I may use:

  def index
    @boards = Board.without :items

    respond_to do |format|
      format.html # index.html.erb
      format.json { render json: @boards }
    end
  end

That my cause some problems?

1 Answer 1

11

If you need to retrieve items separately, then you should not embed them.

My rules of thumb:

  1. Top-level domain objects (things that you work with one their own, that don't always appear in the context of their "parent") should get their own collections.

  2. Embed when the related things

    a. Don't grow unbounded. That is, in the 1-N relation, N is bounded.

    b. Always (or nearly always) appear with their parent.

  3. You can also embed if you can prove to yourself that the performance improvements to be gained by embedding outweigh the costs of the multiple queries required to obtain all objects.

Neither embedding nor relating should be preferred. They should be considered equally.

Sign up to request clarification or add additional context in comments.

6 Comments

In my case embedded itesm will ALWAYS appear with them parents and i chose embedded for that reason but my thoughts were about things like updating one embedded item would require to load all the document, everywhere i read that embedded are fasters but really in these cases how can them be faster?
oh another thing: 'don't grow unbounded' what do you mean?
Unbounded growth is allowing a field to constantly be added to, or a value to be changed such that it overflows its original storage space - imagine an array that you are always adding new fields to versus one that is X elements long and no more. Unbounded growth can lead to expensive move operations and data fragmentation. The classic example is blog comments, you usually want to return a bounded subset with each post (say 5 most recent) rather than all comments ever (unbounded). FYI - it's tip number 6 from "50 Tips and Tricks for MongoDB Developers" by O'Reilly :)
@AdamC this is about retrieving data not storing them, right? in blog comments I may decide to show only N number of comments in my post view but in my database i will always have them
Right, you want to keep them all, but do you want to return all of them every time you return a document? This is the usual embed versus reference argument. Unbounded fields are bad from a performance perspective and so it can be better to summarize and store in your main collection, then reference the full version if needed. Gives you the performance for the most common use case, but full functionality. It does complicate your schema and updates though, so there is always a trade off - it's not a simple decision in a lot of cases.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.