0

In asking a questions relating to using ES for web applications, suggestions have been made to have one index for things like user profiles, another index for data, etc., and several other ones for logs.

Having these all on a cluster with several web applications, this seems like things could get messy or disorganized.

In that case, are people using one cluster per application? I am a bit confused because when I read articles about indexing logs, they seem to refer to storing the data in multiple indices, rather than types within an index.

Secondly, why not have one index per app, with types for logs, user profiles, data, etc.?

Is there some benefit to using multiple indices rather than many types within an index for a web application?

-- UPDATE --

To add to this, the comments in this question, Elastic search, multiple indexes vs one index and types for different data sets?, don't seem to go far enough in explaining why:

data retention: for application log/metric data, use different indexes if you require different retention period

Is that recommended because it's just simpler to delete an entire index rather than a type within an index? Does it have to do with the way the data is stored then space recovered after deleting the data?

4
  • this is opinion based/too-broad. Commented Aug 9, 2015 at 13:16
  • I would rather hear good opinions, pros, rather than the bad ones many architects think are actually good. Rather than down-voting and not offering anything constructive, perhaps if you would suggest a better way to ask the question that would be more beneficial to the community. Commented Aug 9, 2015 at 13:23
  • It is preferable to use many different indexes, each containing its own type, to one index containing many types. From experience, when you have two types in one index that have a property with the same name, the type of the property must be the same for each type on which the property exists. If they are not, if I recall correctly, Elasticsearch throws an exception. Indexing types into separate indexes on the same cluster will avoid this situation from arising. Commented Aug 10, 2015 at 11:00
  • @RussCam - I believe the issue that you are referring to can be found here: elastic.co/guide/en/elasticsearch/guide/current/mapping.html - We can avoid this problem either by naming the fields differently—for example, title_en and title_es—or by explicitly including the type name in the field name and querying each field separately Commented Aug 10, 2015 at 18:50

1 Answer 1

0

I found the primary reason for creating multiple indices that satisfies my quest for an answer in ElasticSearch's pagination documentation:

To understand why deep paging is problematic, let’s imagine that we are searching within a single index with five primary shards. When we request the first page of results (results 1 to 10), each shard produces its own top 10 results and returns them to the requesting node, which then sorts all 50 results in order to select the overall top 10.

Now imagine that we ask for page 1,000—results 10,001 to 10,010. Everything works in the same way except that each shard has to produce its top 10,010 results. The requesting node then sorts through all 50,050 results and discards 50,040 of them!

You can see that, in a distributed system, the cost of sorting results grows exponentially the deeper we page. There is a good reason that web search engines don’t return more than 1,000 results for any query.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.