5

I am rather new to DynamoDB and currently we are thinking about migrating an existing project to a serverless application using DynamoDB where we want to adapt the following setup from a RDMS database:

Tables:

  • Projects (ProjectID)
  • Files (FileID, ProjectID, Filename)
  • Tags (FileID, Tag)

We want to make a query with DynamoDB to fetch all Files for a specific Project (by ProjectID) with one or multiple Tags (by Tag). In an RDMS this query would be simple with something like:

SELECT * FROM Files JOIN Tags ON Tags.FileID = Files.FileID WHERE Files.ProjectID = ?PROJECT AND Tags.Tag = ?TAG_1 OR ?TAG_2 ...

At the moment, we have the following DynamoDB setup (but it can still be changed):

  • Projects (ProjectID [HashKey], ...)
  • Files (ProjectID [HashKey], FileID [RangeKey], ...)

Please also consider that the number of project entries is huge (between 1000 - 30000) and also the number of files for each project (is between 50 and 100.000) and the query should be really fast.

How can this be achieved using DynamoDB-query, best without using filter expressions since they are applied after data selection? It would be perfect if the table Files could have a StringSet Tags as column but I guess that this cannot be used for an efficient DynamoDB-query (so without using DynamoDB-scan) since DynamoDB-indices can only be of type String, Binary and Number and not of type StringSet? Is this maybe an applicable use case for the Global Secondary Index (GSI)?

2 Answers 2

6

A bit late, just saw this question referenced from another one.

I guess you've went and solved it something like this?

DynamoDB tables

  • Projects (ProjectID [HashKey], ...)
  • Files (ProjectID [HashKey], FileID [RangeKey], ...)
  • Tags (Tag [HashKey], FileID [RangeKey], ProjectID [LSI Sort Key])

On the FileTags, you need the FileID to make the primary key unique, but you can add the ProjectID as a sort key for a Local Secondary Index, so you can search on Tag + ProjectID.

It's some sort of Data Denormalization, but that's what it takes to go NoSQL :-( . E.g. if your File would be switched to another Project, you'll need to update the ProjectID not only on the File, but also on all the Tags.

Sign up to request clarification or add additional context in comments.

1 Comment

or just put all in single table, which is super popular dynamodb design pattern. "single table design". Also you would keep connected items updated always together with transactional write.
4

The question is almost three years old, but it's still in the Google result. So if anybody else lands here, maybe the following page from DynamoDB docs can help. Just found it myself and didn't try it yet, but it looks promising. Seems to be newer than the other replies here and shows a nice approach to solve the problem.

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-adjacency-graphs.html

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.