1

I'm trying to figure out how to model the following data in AWS DynamoDB table.

I have a lot of IOT devices, each sends telemetry data every few seconds.

Attributes

  1. device_id
  2. timestamp
  3. malware_name
  4. company_name
  5. action_performed (two possible values)

Queries

  1. Show all incidents that happened in the last week.
  2. Show all incidents for a specific device_id.
  3. Show all incidents with action "unable_to_remove".
  4. show all incidents related to specific malware.
  5. Show all incidents related to specific company.

Thoughts

  1. I understand that I can add GSI's for each attribute, but I would like to use GSI's only if there is no other choice as it costs me more money.

  2. What would be the main primary-key (partition-key:sort-key) ?

Please share you thoughts, I care about them more than I care about the perfect answer as I'm trying to learn how to think and what to consider instead of having an answer for a specific question.

Thanks a lot !

1

1 Answer 1

2

If you absolutely need the querability patterns mentioned, you have no way out but create GSIs for each. That too has its set of caveats:

  • For query #1, your GSI would be incident_date (or whatever) as partition-key and device_id as sort-key. This might lead to hot partitioning in DynamoDB, based on your access patterns.
  • There is a limit of 5 GSIs per table, that you'll use up right away. What'll you do if you need to support another kind of query in future?

While evaluating pros and cons of using NoSQL for a given situation, one needs to consider both read and write access patterns. So, the question you should ask is, why DynamoDB?

For e.g., do you really need realtime queries? If not, you can use DynamoDB as the main database and periodically sync data (using AWS Lambda or Kinesis Firehose) to EMR or Redshift for later batch processing.

Edit: Proposed primary key:

  • device_id as partition-key and incident_date as sort-key, if you know that no 2 or more incidents, for a given device_id, can come at exact same time.
  • If above doesn't work, then incident_id as partition-key and incident_date as sort-key.
Sign up to request clarification or add additional context in comments.

4 Comments

What would be the primary key in your way?
So @ketan, you would suggest using those as primary and sort key and use another 4 GSIs?
Yes. If you need those 5 query patterns in real-time, you have no other choice
How would I retrieve all incidents from last week in your modeling? Btw, I do not have incident_id

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.