4

We are storing a handful of polymorphic document subtypes in a single index (e.x. let's say we store vehicles with subtypes of car, van, motorcycle, and Batmobile).

At the moment, there is >80% commonality in fields across these subtypes (e.x manufacturer, number of wheels, ranking of awesomeness as a mode of transport).

The standard case is to search across all types, but sometimes users will want to filter the results to a subset of the subtypes: find only cars with...).

How much overhead (if any) is incurred at search/index time from modelling these subtypes as distinct ElasticSearch types vs. modelling them as a single type using some application-specific field to distinguish between subtypes?

I've looked through several related answers already, but can't find the answer to my exact question.

Thanks very much!

2
  • Based on my previous experience with Lucene/SOLR, I'm assuming the two approaches look and perform about the same at the Lucene index level. Either way, they'd share the same schema. The only difference would be that ElasticSearch is aware of the type difference as a semantic level, instead of the application having to manage the difference. Is my understanding correct? Commented Dec 11, 2013 at 18:15
  • 1
    Yes, that sounds about right. I strongly disagree with what Rotem states in his answer though: if your proposed types have different types for the same field key, they shouldn't be stored in the same index, because the values will be stored in the same field, which means that if they're both loaded during a query, you'll get weird results and/or failed queries and backend exceptions. Obligatory example that shows how some of the facets are missing, since the field is assumed to be numeric, which can't count a string type field during faceting. Commented Dec 12, 2013 at 11:08

1 Answer 1

3

There shouldn't be any noticeable overhead.

If you keep everything under the same type, you can filter results by a subtype by adding a "class" field on your objects and adding a condition on this field in your search.

A good reason to model your different classes into different ES types is if there can be a conflict between type of fields with the same name.

That is, assume your "car" class has a "color" field that holds integer number, while your "van" class also has a "color" field but this one is a string. (Stupid example, I know, didn't have any better idea).

Elasticsearch holds the mapping (the data "schema") for a type. So if you index both "car" and "van" under the same type, you will have a field type conflict. A field in a type can have one specific type. If you set the field as integer and then try to index a string into it, it will fail.

This is one of the main guidelines on how to use Elasticsearch types - treat the type as a specific data schema that can't have conflicts.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.