0

I have a collection in which below is the data:

   "sel_att" : {
        "Technical Specifications" : {
            "In Sales Package" : "Charger, Handset, User Manual, Extra Ear Buds, USB Cable, Headset",
            "Warranty" : "1 year manufacturer warranty for Phone and 6 months warranty for in the box accessories"
        },
        "General Features" : {
            "Brand" : "Sony",
            "Model" : "Xperia Z",
            "Form" : "Bar",
            "SIM Size" : "Micro SIM",
            "SIM Type" : "Single Sim, GSM",
            "Touch Screen" : "Yes, Capacitive",
            "Business Features" : "Document Viewer, Pushmail  (Mail for Exchange, ActiveSync)",
            "Call Features" : "Conference Call, Hands Free, Loudspeaker, Call Divert",
            "Product Color" : "Black"
        },
        "Platform/Software" : {
            "Operating Frequency" : "GSM - 850, 900, 1800, 1900; UMTS - 2100",
            "Operating System" : "Android v4.1 (Jelly Bean), Upgradable to v4.4 (KitKat)",
            "Processor" : "1.5 GHz Qualcomm Snapdragon S4 Pro, Quad Core",
            "Graphics" : "Adreno 320"
        }
    }

The data mentioned above is too huge and the fields are all dynamically inserted, how can I index such fields to get faster results?

3
  • Which mongodb driver are you using? Commented Aug 26, 2014 at 12:49
  • 3
    What to index depends on what you are querying. Commented Aug 26, 2014 at 12:50
  • 1
    You cannot, if mongodb complains that you attempting to index too much then it is above the threshold where indexes start becoming more of a burden. Exactly what inex specification are you trying? can ou post your ensureindex command()? Commented Aug 26, 2014 at 12:51

5 Answers 5

1

It seems to me that you have not fully understood the power of document based databases such as MongoDB.

Bellow are just a few thoughts:

  • you have 1 million records
  • you have 1 million index values for that collection
  • you have to RAM available to store 1 million index values in-memory, otherwise the benefits of indexing would not be so keen to show up
  • yes you can have sharding but you need lots of hardware to accommodate basic needs

What you for sure need is something that can make dynamically link random text to valuable indexes and that allows you to search in vast amounts of text very fast. And for that you should use a tool like ElasticSearch.

Note that you can and should store your content in a NoSQL database and yes MongoDB is a viable option. And for the indexing part ElasticSearch has plugins available to enhance the communication between the two.

P.S. If I recall correctly the plugin is called MongoDB River

EDIT:

I've also added a more comprehensive definition for ElasticSearch. I won't take credit for it since I've grabbed it from Wikipedia:

Elasticsearch is a search server based on Lucene. It provides a distributed, multitenant-capable full-text search engine with a RESTful web interface and schema-free JSON documents

EDIT 2:

I've scaled down a bit on the numbers since it might be far-fetched for most projects. But the main idea remains the same. Indexes are not recommended for the use-case described in the question.

Sign up to request clarification or add additional context in comments.

6 Comments

Where does he say at all that he has 100m of anything?
@Sammaye well the usual use-case for MongoDB is large unstructured document based collections. And from experience they usually get quite big quite fast. Besides it was just a simple computation to make him understand that indexing doesn't solve any complex problem.
Im not sure who has the abilities to sovle that problem, 100m entires all in ram at the same time...that's one big server especially if the each index entry is a dynamic specification of fields
that's why I don't suggest indexing as a good idea for his use-case, I can go down on the numbers if you think they're overstretched (and for texts of the size he suggests it's for sure overstretched)
It would seem more plausible, I use ES in a sizeable envo and 100m all in memory at once is a bit...big
|
0

Based on what you want to query, you will end up indexing those fields. You can also have secondary indexes in MongoDB. But beware creating too many indexes may improve your query performance but consume additional disk space and make inserts slower due to re-indexing.

MongoDB indexes

3 Comments

The more indexes you have the slower your queries are. You should optimize your database and the way you use it before taking the index approach. And this a piece of advice I've found reading on the MongoDB official website
@BogdanEmilMariesan , I meant the same thing=> more the number of indexes the slower are the queries, but right use of indexes on few ,relevant fields will definitely improve query performance.
oh sorry mate didn't fully understand your answer :)
0

Short answer: you can't. Use Elastic Search. Here is a good tutorial to setup MongoDB River on Elastic Search

The reason is simple, MongoDB does not work like that. It helps you store complex schemaless sets of documents. But you cannot index dozens of different fields and hope to get good performance. Generally a max of 5-6 indices are recommended per collection.

Elastic Search is commonly used in the fashion described above in many other use-cases, so it is an established pattern. For example, Titan Graph DB has the built-in option to use ES for this purpose. If I were you, I would just use that and would not try to make MongoDB do something it is not built to do.

If you have the time and if your data structure lends itself to (I think it might from the json above), then you could also use rdbms to break down these pieces and store them on-the-fly with an EAV like pattern. Elastic Search would be easier to start and probably easier to achieve performance quickly.

Comments

0

Well, there are lots of problems w.r.t having many indexes and has been discussed here. But if at all you need to add indexes for dynamic fields you actually create index from you mongo db driver.

So, lets say if you are using the Mongodb JAVA driver then you could create an index like below: http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-java-driver/#creating-an-index

coll.createIndex(new BasicDBObject("i", 1));  // create index on "i", ascending

PYTHON

http://api.mongodb.org/python/current/api/pymongo/collection.html#pymongo.collection.Collection.create_index

So, when you are populating data using any of the drivers and you find a new field which has come thru then you could fire index creation using driver itself and not have to do it manually.

P.S.: I have not tried this and it might not be suitable or advisable.

Hope this helps!

Comments

0

Indexing of dynamic fields is tricky. There is no such thing as wildcard-indexes. Your options would be:

Option A: Whenever you insert a new document, do an ensureIndex with the option sparse:true for each of its fields. This does nothing when the index already exists and creates a new one when it's a new field. The drawback will be that you will end up with a very large number of indexes and that inserts could get slow because of all the new and old indexes which need to be created/updated.

Option B: Forget about the field-names and refactor your documents to an array of key/value pairs. So

    "General Features" : {
        "Brand" : "Sony",
        "Form" : "Bar"
    },
    "Platform/Software" : {,
        "Processor" : "1.5 GHz Qualcomm",
        "Graphics" : "Adreno 320"
    }

becomes

 properties: [
     { category: "General Features", key: "Brand", value: "Sony" },
     { category: "General Features", key: "Form", value: "Bar" },
     { category: "Platform/Software", key: "Processor", value: "1.5 GHz Qualcomm" },
     { category: "Platform/Software", key: "Graphics", value: "Adreno 320" }
 ]

This allows you to create a single compound index on properties.category and properties.key to cover all the array entries.

1 Comment

I think he needs to search through large texts for keywords and to be honest I don't think that good old Mongo indexes would do him any help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.