Elastic search one index or multiple index for same data

Question

I'm building an application which could greatly benefit from ElasticSearch. In my current version I'm using 1 single index: "messages" with just 1 type: "message".

Messages are composed of the following format (averaging 10kb):

messages
- id
- subject (string)
- date (date) (format: dateOptionalTime)
- account_id (integer)
- body (string)
- receivers (nested)
   properties:
      name (string)
      email (string) 
- files (nested)
   properties:
      content_type (string)
      filename (string) 
      size (long)

Searches are currently on an account_id basis (adding a filter to each query). In my mySQL database each account has an company_id (one company can have multiple accounts). In the future I might be willing to allow a user to search company-wide instead of within a single account. My dataset is kind of large (>50m documents).

My question is what would be best, just using this single index (messages) with a single type (message), or do something like on a company-wide index where each I would create a new index for each company (like messages_%company_id%).

My dataset will grow between 1 - 5M documents a month, documents almost never have to be deleted. Old data can be as valuable in here as a fresh inserted document.

yahermann · Accepted Answer · 2014-11-06 03:12:33Z

1

I would stick with a single index and a single type.

An ES "index" is analogous to a SQL "database". An ES "type" is analogous to a SQL "table". Would you create separate databases or separate tables for separate companies? Probably not.

ES scales very nicely, and makes it plenty easy to search by just about anything you wish within the type. 50M documents should be no problem as long as you give ES the necessary hardware.

One additional note: If there's any temptation of making ES your sole data store, I would resist it. I don't think it's quite there yet. Keep your MySQL database as your "authoritative" storage engine, and use ES for your searching.

answered Nov 6, 2014 at 3:12

yahermann

1,5971 gold badge13 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Floris Over a year ago

Currently I'm mySQL as my main datastore, some of the (important) metadata of the S3 documents is in there, the rest is in raw files on S3. So ES does provide me with the search functionality but I would always be able to fully rebuild / restore this from my mySQL combined with S3.

yahermann Over a year ago

Yes, as long as you're able to rebuild from MySQL+S3 you should be just fine!

Floris Over a year ago

so I wouldn't be able to get a better performance or less resources required if I create like multiple indexes and/or types.

yahermann Over a year ago

Nope. ES scales just as well with a single large type within a single index. The only real reason to separate is if the SCHEMA is different. For example, you might have a "message" type, and separate "person" type, each type having completely schema/fields. Remember, an ES "type" is roughly equivalent to a SQL "table".

Collectives™ on Stack Overflow

Elastic search one index or multiple index for same data

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related