1

Ok let's first look at a screen shot. This is a screen shot of a text file we call it VCF file. How many rows it might have? Maybe 100,000 rows of things like this:

enter image description here

I am totally new and novice to MongoDB so I thought of a schema like this:

enter image description here

So for example notice REF in that text file is a Key/Value in my schema. But like I said it might have 200,000 rows... So:

  1. Are Arrays still a good thing I can use? storing 200,000 members in that array?
  2. How powerful I can query on it? so in the text file we have rows, for example that #CHROM20 in POS of 14370 has a REF of "G" and ALT of "A" ... so with my Schema can we find and return it? Let's say we say search for patients that have "G" in their REF field, so are MongoDB queries powerful enough to search and return such a result?
  3. Is it a bad schema? Do you have better recommendations/advice?
  4. Any sample query could you give for my qquesry in question will be so helpful to give me some ideas..
1
  • Matthew Shopsin is asking: Let's say we say search for patients that have "G" in their REF field: Does ref:[TCG,TA] count or only ref:[A,T,ATC,G] ? Commented May 29, 2012 at 19:14

2 Answers 2

1

Sorry for the very slow reply, I had left for holiday when you replied. The following syntax achieves the desired outcome.

> db.refs.insert({ref:[A,T,ATC,G]})
> db.refs.insert({ref:['A','T','ATC','G']})

> db.refs.findOne()
{
    "_id" : ObjectId("4fda21bb8a807d87a65aba37"),
    "ref" : [
        "A",
        "T",
        "ATC",
        "G"
    ]
}
> db.refs.insert({ref:['TCG','TA']})
> db.refs.find()
{ "_id" : ObjectId("4fda21bb8a807d87a65aba37"), "ref" : [ "A", "T", "ATC", "G" ] }
{ "_id" : ObjectId("4fda22438a807d87a65aba38"), "ref" : [ "TCG", "TA" ] }


> db.refs.find({ref :{$all : ['G']}})
{ "_id" : ObjectId("4fda21bb8a807d87a65aba37"), "ref" : [ "A", "T", "ATC", "G" ] }

Is this what you had in mind?

A big concern in schema design is avoid the 16MB document limit. While you can have as many documents as can be addressed with 64 bit address space, I don't know how your document is likely to grow. This restriction may necessitate that you break out some of the fields into other documents that you reference.

Sign up to request clarification or add additional context in comments.

Comments

1

Let's say we say search for patients that have "G" in their REF field

Does ref:[TCG,TA] count or only ref:[A,T,ATC,G] ?

1 Comment

This sould be a comment but you don't have the repuation to post one. So I have posted your comment below the question - I suggest you delete your answer (or turn it into a real answer once you get the information you require).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.