I am writing a service that will be creating and managing user records. 100+ million of them. For each new user, service will generate a unique user id and write it in database. Database is sharded based on unique user id that gets generated.
Each user record has several fields. Now one of the requirement is that the service be able to search if there exists a user with a matching field value. So those fields are declared as index in database schema.
However since database is sharded based on primary key ( unique user id ). I will need to search on all shards to find a user record that matches a particular column.
So to make that lookup fast. One thing i am thinking of doing is setting up an ElasticSearch cluster. Service will write to the ES cluster every time it creates a new user record. ES cluster will index the user record based on the relevant fields.
My question is :
-- What kind of performance can i expect from ES here ? Assuming i have 100+million user records where 5 columns of each user record need to be indexed. I know it depends on hardware config as well. But please assume a well tuned hardware.
-- Here i am trying to use ES as a memcache alternative that provides multiple keys. So i want all dataset to be in memory and does not need to be durable. Is ES right tool to do that ?
Any comment/recommendation based on experience with ElasticSearch for large dataset is very much appreciated.