13

We host lots of websites for businesses, each business will have a number of document types they may want to get indexed and searched via ES.

Normally, the number of document types each business has is less than 20, each type may have less than 100k documents (usually much less).

I'm not sure how I should setup the data for these websites? Should I put them into a separate index, or should I jam them all into the same index with different document types? Or is there another option?

Or perhaps, I should even go as far as indexing small and medium sites differently? What are some worst case scenarios I should be prepared for if I plan to grow to 50K sites?

2 Answers 2

13

If you create one index with several mapping types, you will have a big constraint that requires you to make sure that no fields with the same name in two different mapping types have two different types, i.e. you can't have a field named blablaCount being a long in one mapping type and a double in another mapping type within the same index.

Your mileage may vary, but since ES 2.0 and the great mapping refactoring, it is usually recommended to go with several indices and one mapping type per index.

What I would do is to create several indices and one mapping/document type per index, then you'd simply group all indices belonging to a given business with an alias, so that if you need to query all indices of a given business, you can simply query the alias for that business.

Another option is to put all documents of all businesses in the same set of indices and simply discriminate each business using a term query on its businessId field, or even by routing on the businessId.

However, in your case, since each business doesn't have that many documents, it might be a waste of resource to create a full set of indices for each business, so I'd probably go with the second option, i.e. create a set of indices, each with its own mapping/document types and then store all documents from all business in those indices.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you very much. Since each business will define their own document types (different field names and field types etc), it seems impossible to let them share the same set of indices unless we put the document types generated by each business in different document types. That seems like the only choice left for me is to use different index per website? Everyone seems to say it will create load of overhead, just how much overhead you think I'm looking at?
Ok, if each business has a total latitude on the mappings they want to create and the types and naming of their fields, then indeed, you're better off given them their own set of indices, for each business, that is. It depends on how many businesses we're talking about if you have 20+ indices per business, the cluster can hog resources pretty quickly. However, if you know in advance that each index might not contain that many documents, you might be lucky and only need one or two primary shards per index, instead of of the default five.
Also, given the business you seem to be in, I'd strongly encourage you to read up on how Wordpress went into their Elasticsearch migration: here and here and then follow all the links in the latter ;)
Super useful, I will make sure to read it up before proceeding any further.
Glad I could help! All the blog posts by Greg Brown are great reads with many detailed insights.
5

Elasticsearch are removing mapping types completely as of 7.0 so are encouraging a single index per document type.

https://www.elastic.co/guide/en/elasticsearch/reference/current/removal-of-types.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.