0

I have a rather huge application storing data in MongoDB (Mongoose) despite the fact my data is absolutely sequel and can be presented as tables with schemas very well. The specific is I have a lot of relations between objects. So I need to perform very deep populations — 25+ for each request in total.

A good way is to rewrite app for MySQL. However there are tonnes of code binded on MongoDB. The question is: if there will be growing amount of relations between objects by ObjectID, will it be still so efficient as MySQL or should I dive into code and move app complete to MySQL?

In both cases I use ORM. Now Mongoose, if I move — Sequelize.

Is Mongo really efficient in working with relations? I mean, SQL was designed to join tables with relations, I hope it has some optimisations undercover. Relations for Mongo seem to be a bit unusual usecase. So, I worry if logically the same query for gathering data from 25 collections in Mongo or join data from 25 tables in MySQL may be slower for Mongo.

Here's the example of Schema I'm using. Populated fields are marked with *.

Man
 -[friends_ids] --> [Man]*
                     -friends_ids*: ...
                     -pets_ids*: ...
                     -...
 -[pets_ids] -> [Pet]*
                 -name
                 -avatars*: [Avatar]
                            -path
                            -size
-...

My thoughts about relations. Lets imagine Man object that should have [friends] field. Let take it out.

MySQL ORM:

  1. from MANS table find Man where id=:id.
  2. from MAN-TO-MANS table find all records where friend id = :id of Man from step 1
  3. from MANS table find all records where id = :id of Men from step 2
  4. join it into one Man object with friends field populated

Mongo:

  1. from MANS collection find Man where _id=:_id. Get it's friends _id's array on this step (non populated)
  2. from MANS collection find all documents where _id = :_id of Men from step 1
  3. join it into one Man object with friends field populated

No requestes to JOIN tables. Am I right?

2 Answers 2

1

So I need to perform very deep populations — 25+ for each request in total.

A common misconception is that MongoDB does not support JOINs. While this is partially true it is also quite untrue. The reality is that MongoDB does not support server-side joins.

The MongoDB motto is client side JOINing.

This motto can work against you; the application does not always understand the best way to JOIN as such you have to pick your schema, queries and JOINs very carefully in MongoDB to ensure that you are not querying inefficiently.

25+ is perfectly possible for MongoDB, that's not the problem. The problem will be what JOINs you are doing.

This leads onto:

Is Mongo really efficient in working with relations?

Let me give you an example of where MongoDB would actually be faster than MySQL.

Imagine you have a group collection with each group document containing a user_ids field which is represented as an array of ObjectIds which directly relate to the _id field in the user collection.

Doing two queries, one for the group and one for the users would likely be faster than MySQL in this specific case since MongoDB, for one, would not need to atomically write out a result set using your IO bandwidth for common tasks.

This being said though, anything complex and you will get hammered by the fact that the application does not truly know how to use index inter-sectioning and merging to create a slightly performant JOIN.

So for example say you wish to JOIN between 3 tables in one query paginating by the 3 JOINed table. That would probably kill MongoDBs performance while not being such an inefficient JOIN to perform.

However, you might also find that those JOINs are not scalable anyway and are in fact killing any performance you get on MySQL.

if there will be growing amount of relations between objects by ObjectID, will it be still so efficient as MySQL or should I dive into code and move app complete to MySQL?

Depends on the queries but I have at least given you some pointers.

Sign up to request clarification or add additional context in comments.

9 Comments

Thank you! Not sure I'm really got you about "atomically write out a result set using your IO bandwidth for common tasks". I've attached sample Schema for my current DB. As you can see I have two levels of population for pets_ids-line: pets_ids -> Pets -> Pet Avatar. Man has a lot of such [ObjectID] fields that should be populated and imply some inner population.
@f1nn MySQL, unlike MongoDB, does not read directly from the data files where your information is stored. Instead it takes a "snapshot" which it writes out to a temporary result table. It is from this table that MySQL actually reads.
@f1nn as for the relation, avatars could easily be embedded into pets making this scenario is two query solution as I stated above. This could work in MongoDB
Well, the same avatar may be used for another pets, that's why I've separated it into standalone collection.
@f1nn ah ok, I see, I thought it would be a picture of the pet. Hmm, yeah it should still be doable but it might require either some extra queries or some pulling out of IDs client side and querying using an $in. But I would try it out, see what you get out of it
|
0

Your question is a bit broad, but I interpret it in one of two ways.

One, you are saying that you have references 25 levels deep, and in that case using populate is just not going to work. I dearly hope this is not the pickle you find yourself in. Moving to SQL won't help you either, the fact is you'll be going back to the database too many times no matter what. But if this is how it's got to be, you can tackle it using a variation of the materialized path pattern, which will allow you to select subtrees much more efficiently within your very deep data tree. See here for a discussion: http://docs.mongodb.org/manual/tutorial/model-tree-structures-with-materialized-paths/

The other interpretation is that you have 25 relations between collections. Let's say in this case there is one collection in Mongo for every letter of the English alphabet, and documents in collection A have references to one or more documents in each of collections B-Z. In this case, you might be ok. Mongoose populate lets you populate multiple reference paths, and I doubt if there is a limit it is anywhere as low as 25. So you'd do something like docA.populate("B C ... Z"). In this case also, moving to SQL won't help you per se, you'll still be required to join on multiple tables.

Of course, your original statement that this could all be done in SQL is valid, there doesn't seem to have been a specific reason to use (or not use) Mongo here, just seems to be the way things were done. However, it also seems that whether you use NoSQL or SQL approaches here isn't the determining factor in whether you will see inefficiency. Rather, it's whether you model the data correctly within whatever solution you choose.

4 Comments

Hi, @user1417684, thanks for your reply. I meant obviously the second case. I have about 30 collections and some requests have to join data from 25 of them.
user1417684 Anyway, the main question is Mongo is really efficient in working with relations? I meant, SQL was designed to join tables with relations, I hope it has some optimisations undercover. Relations for Mongo seem to be a bit unusual usecase. So, I worry if logically the same query for gathering data from 25 collections in Mongo or join data from 25 tables may be slower for Mongo.
It feels like if you have simple joins among relations, SQL has been doing this a lot longer and probably is quite good at it, so it would be faster. But that's nothing more than a qualitative statement that I have no actual data to back up with. You might find such data out there on the net though.
The answer to that question is known, SQL is more efficient with joins and relations. But this doesn't mean it's the only way. Depending on your code, your queries may be optimized and perform well enough. But without specific details, one can hardly tell.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.