-2

I have a web application with an express/node backend using typeorm and PostgreSQL. The home page in my app is a query with lots of inner joins that shows the user a complex report. This query takes about 30 seconds to run which is a bad experience.

I could easily add caching with a ttl value, but that has 2 problems. First the report could be out of date if the user hits the cache after updating data. Second the first page load after the ttl expires will be slow.

Since the report only changes when more records are added to the database I could use the number of records as a key to tell me whether the cache value is out of date or not, solving the first problem. And then I could have a queued process that updates the cache in the background any time the number of records in the database changes, solving the second problem.

The only thing is, I don’t know if any third party libraries exist that already do this or if I’m somehow reinventing existing functionality. Does this strategy have a name?

2
  • 1
    If you go thru all the hassle of detecting new records to cache the query, you could also consider going all the way and create a reporting service that creates the report itself. Commented Aug 2, 2020 at 20:02
  • Have you considered using a database trigger to invalidate and rebuild the cache record? If you set it up to fire before the record is written/committed, it should eliminate the possibility of the cache going stale. Commented Aug 3, 2020 at 19:30

2 Answers 2

1

First the report could be out of date if the user hits the cache after updating data

An update operation should also invalidate the cache. With proper invalidation, you'll always have up to date results.

Second the first page load after the ttl expires will be slow.

It depends on what you cache. If you cache the final result, this means indeed that you'll hit the thirty seconds delay every time you need to generate it again. Instead, cache only the parts of the data you need, and create the report based on the cached data.

This way, you gain in flexibility. Imagine the sales report which needs, among others, to get the full names of customers who purchased the most goods for a period of time. Since those names don't usually change and since getting outdated info here shouldn't be a problem, you can cache the mapping (user ID → person's full name) for a very long period of time, such as a week. On the other hand, the most expensive recent purchase may be highly volatile, so you may cache it only for a few seconds, but not more.

1

This query takes about 30 seconds to run which is a bad experience.

What actions have you taken to improve the performance of this query? Hiving the results off into a cache is just kicking this particular Problem's "can" further down the road.

... a query with lots of inner joins ...

Lots of joins do not imply a poorly performing Query.

Yes, caching the result is probably a Good Idea, until your caching software is restarted or starts "blinking" of its own accord, dropping your cached data and your Application starts throwing hundreds of these poorly performing queries at your database all at once, to re-populate the cache!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.