I'm currently experiencing issues with a service I've developed that relies heavily on large payload reads from the db (500 rows). I'm seeing huge throughput, in the range of 35,000+ requests per minute for up to 500 rows per request going through the DB and it is not handling the scaling at all.
The data in question is retrieved primarily on a latitude / longitude where statement that checks if the latitude and longitude of the row can be contained within a minimum latitude longitude coordinate, and a maximum latitude longitude coordinate. This is effective checking if the row in question is within the bounding box created by the min / max passed into the where.
This is the where portion of the query we rely on for reference.
s.latitude > {minimumLatitude} AND
s.longitude > {minimumLongitude} AND
s.latitude < {maximumLatitude} AND
s.longitude < {maximumLongitude}
SO, with that said. MySQL is handling this find, I'm presently on RDS and having to rely heavily on an r3.8XL master, and 3 r3.8XL reads just to get the throughput capacity I need to prevent the application from slowing down and throwing the CPU into 100% usage.
Obviously, with how heavy the payload is and how frequently it's queried this data needs to be moved into a more fitting service. Something like Elasticache's services or DynamoDB.
I've been leaning towards DynamoDB, but my only option here seems to be using SCAN as there is no useful primary key I can associate on my data to reduce the result set as it relies on calculating if the latitude / longitude of a point is within a bounding box. DynamoDB filters on attributes would work great as they support the basic conditions needed, however on a table that would be 250,000+ rows and growing by nearly 200,000 a day or more would be unusably expensive.
Another option to reduce the result set was to use a Map Binning technique to associate a map region with the data, and reduce on that in dynamo as the primary key and then further filter down on the latitude / longitude attributes. This wouldn't be ideal though, we'd prefer to get data within specific bounds and not have excess redundant data passed back as the min/max lat/lng can overlap multiple bins and would then pull data from pins that a majority may not be needed for.
At this point I'm continuously having to deploy read replicas to keep the service up and it's definitely not ideal. Any help would be greatly appreciated.