2

I'm building an application which could greatly benefit from ElasticSearch. In my current version I'm using 1 single index: "messages" with just 1 type: "message".

Messages are composed of the following format (averaging 10kb):

messages
- id
- subject (string)
- date (date) (format: dateOptionalTime)
- account_id (integer)
- body (string)
- receivers (nested)
   properties:
      name (string)
      email (string) 
- files (nested)
   properties:
      content_type (string)
      filename (string) 
      size (long) 

Searches are currently on an account_id basis (adding a filter to each query). In my mySQL database each account has an company_id (one company can have multiple accounts). In the future I might be willing to allow a user to search company-wide instead of within a single account. My dataset is kind of large (>50m documents).

My question is what would be best, just using this single index (messages) with a single type (message), or do something like on a company-wide index where each I would create a new index for each company (like messages_%company_id%).

My dataset will grow between 1 - 5M documents a month, documents almost never have to be deleted. Old data can be as valuable in here as a fresh inserted document.

1 Answer 1

1

I would stick with a single index and a single type.

An ES "index" is analogous to a SQL "database". An ES "type" is analogous to a SQL "table". Would you create separate databases or separate tables for separate companies? Probably not.

ES scales very nicely, and makes it plenty easy to search by just about anything you wish within the type. 50M documents should be no problem as long as you give ES the necessary hardware.

One additional note: If there's any temptation of making ES your sole data store, I would resist it. I don't think it's quite there yet. Keep your MySQL database as your "authoritative" storage engine, and use ES for your searching.

Sign up to request clarification or add additional context in comments.

4 Comments

Currently I'm mySQL as my main datastore, some of the (important) metadata of the S3 documents is in there, the rest is in raw files on S3. So ES does provide me with the search functionality but I would always be able to fully rebuild / restore this from my mySQL combined with S3.
Yes, as long as you're able to rebuild from MySQL+S3 you should be just fine!
so I wouldn't be able to get a better performance or less resources required if I create like multiple indexes and/or types.
Nope. ES scales just as well with a single large type within a single index. The only real reason to separate is if the SCHEMA is different. For example, you might have a "message" type, and separate "person" type, each type having completely schema/fields. Remember, an ES "type" is roughly equivalent to a SQL "table".

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.