I am working with a team which uses two data sources.
- MSSQL as a primary data source for making transaction calls.
- ES as a back-up/read-only source of truth for viewing the data.
e.g. If I put an order, The order is inserted in DB, then there is a RabbitMQ listener/ Batch which then synchronizes the data from DB to ES.
Somehow this system fails for even just a million records. When I say fails, it means the records are not updated in ES in timely fashion, e.g. Say I create a coupon, then the coupon is generated in DB, when the coupon is generated, customer tries to redeem it immediately, although ES doesn't have the information about the coupon yet, so it fails. Of course there are options to use RabbitMQ's priority Queues etc, but the questions I have got are very basic
I have few questions in my mind, which I asked to the team, and still haven't got satisfactory answers
- What is the minimum load should be expected when we use elastic search, and doesn't it become an overkill if we have just 1M records.
- Does it really makes sense to use ES as source of truth for real-time data?
- Is ES designed for handling relational-like databases, and to handle the data that gets continuously updated? AFAIK such search-optimized databases are once write, multiple read kind.
- If we are doing it to handle load, then how will it be different than making a cluster of MSSQL databases as source of truth and using ES just for analytic?
The main question I have in mind is, how we can optimize this architecture so that we can scale better?
PS: When I asked minimum load, what I really meant is what is the number of records/transaction for which we can say ES will be faster than conventional relational databases? Or there is no such term at all?