NoSQL: Getting the latest values from tables DynamoDB/Azure Table Storage

Question

I have a little problem that needs some suggestions:

Lets say we have a few hundred data tables with a few dozen million rows each.
Data tables are timestamp(key) - value
Data tables are written once every second

The latest entry of each table should be quickly obtainable and will most likely be queried the most (sorta like "follow data in real time"). With the lack of 'Last()' or similar, I was thinking of creating another table "LatestValues" where the latest entry of each data table is updated for a faster retrieval. This, however, would add an extra update for each write operation. Also, most of the traffic would be concentrated on this table (good/bad?). Is there a better solution for this or am I missing something?

Also, lets say we want to query for the values in data tables. Since scanning is obviously out of the question, is the only option left to create a secondary index by duplicating the data, effectively doubling the storaging requirements and the amount write operations? Any other solutions?

I'm primarily looking at DynamoDB and Azure Table Storage, but I'm also curious how BigTable handles this.

jarmod · Accepted Answer · 2021-12-08 12:04:29Z

66

I just published an article today with some common "recipes" about DynamoDB. One of them is "Storing article revisions, getting always the latest" I think it might interest you :)

In a nutshell, you can get the latest item using Query(hash_key=..., ScanIndexForward=True, limit=1)

But, this assumes you have a range_key_defined.

With Scan, you have no such parameter as ScanIndexForward=false and anyway, you can not rely on the order as data is spread over partitions and the Scan request is then load balanced.

To achieve you goal with DynamoDB, you may "split" your timestamp this way:

hash_key: date
range_key: time or full timestamp, as you prefer

Then, you can use the 'trick' of Query + Limit=1 + ScanIndexForward=false

edited Dec 8, 2021 at 12:04

jarmod

79.8k18 gold badges132 silver badges137 bronze badges

answered Oct 9, 2012 at 23:03

yadutaf

7,2201 gold badge41 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

user1597701 Over a year ago

Thank you for the answer and your article was an interesting read. I still have one question regarding ScanIndexForward though. The documentation says: Specifies ascending or descending traversal of the index... Does ScanIndexForward work like GROUP BY and just reverse the order of query results or does it actually read the range_key in reverse order i.e. how many reads does this require? My concern is that by the end of the day (assuming 1 write/s) there is over 86k entries and constantly going through them again and again when getting the most recent value would be expensive.

yadutaf Over a year ago

The range_key is indexed so that it's efficient and, with Query, you pay only for the retrieved results. This said, I don't know how this is implemented internally.

Aylen Over a year ago

Just a heads up: in my case, I needed ScanIndexForward=False instead of True. The default behaviour probably changed at some point after the article was written. The docs for the query method read: If ScanIndexForward is true , DynamoDB returns the results in order, by range key. This is the default behavior. If ScanIndexForward is false, DynamoDB sorts the results in descending order by range key, and then returns the results to the client.

alehro Over a year ago

As of now it seems that ScanIndexForward was replaced by BackwardSearch with the same meaning.Unfortunately I couln't found any documentation for this change.

Jon Peterson Over a year ago

Looks like the link you have here has expired. I found what could be the same content on blog.yadutaf.fr/2012/10/07/…

|

user94559 · Accepted Answer · 2012-10-09 22:47:59Z

-1

In general, you probably just want to reverse the timestamp, so it decreases over time, leaving the newest row on top.

Here's a blog post of mine outlining how to do this with Windows Azure storage: http://blog.smarx.com/posts/using-numbers-as-keys-in-windows-azure.

UPDATE

I use DynamoDB for one project, but in a very simplistic way, so I don't have much experience. That said, http://docs.amazonwebservices.com/amazondynamodb/latest/developerguide/QueryAndScan.html suggest to me that you can just specify ScanIndexForward=false and Limit=1 to get the last item.

answered Oct 9, 2012 at 22:47

user94559

60.3k6 gold badges108 silver badges107 bronze badges

2 Comments

user1597701 Over a year ago

Thank you for your answer. Hadn't considered of trying to solve it like that, although I'm a bit hesitant on formatting the timestamp as it is 'part of the data' and as such it will be queried and it should be in a format that an user can understand. With this solution I'd have to reprocess every timestamp to reverse the transformation for every query.

user94559 Over a year ago

I would suggest storing another column with the timestamp in its normal representation.

Alec · Accepted Answer · 2020-06-12 01:04:56Z

-6

For folks who found this thread but only care about 1 table:

You can get the latest item from a table in the UI by clicking on the column to sort by those values.

answered Jun 12, 2020 at 1:04

Alec

2463 silver badges5 bronze badges

1 Comment

zhark Over a year ago

This only sorts the currently page of results (~100 records), presumably to avoid scanning the whole table.

Collectives™ on Stack Overflow

NoSQL: Getting the latest values from tables DynamoDB/Azure Table Storage

3 Answers 3

7 Comments

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related