0

We have NodeJs application and SQL Server database, and there are couple of badly written queries with a lot of inner joins.

Problem and Use Case

We have use case of generating report (15-20 thousand reports) in PDF / Excel format and there is a query with a lot of joins, which takes almost 8-9 seconds, as there is a huge amount of data - 2-3 tables used in query which have a few million rows each.

For report generation we don't need the real-time data, it can contain a day old or week old data which is fine.

What I'm looking for: a few suggestions to handle this situation in better possible way.

We have few options on table

  1. Dump data from multiple queries in separate table and use it (we are planning to do this activity in periodic manner with the help of scheduler or something on similar lines)

  2. Use time series DB to store the result of query with the help of scheduler, and use it at the time of report generation.

  3. Limiting report generation to use at max last 1 year of data.

  4. Implement sharding in SQL Server

And yes improving query is also something we are working on; but I think there is scope to make it better and that's the reason I'm reaching out here to get few more suggestions.

3
  • 1
    I would suggest you start simple, do index/query tuning, investigate locking/blocking, storage speed, see how far this gets you. If that is not enough, you can create indexed views, with the data for the reports and recreate them every X days with the new timeframe in the where clause. Commented Dec 26, 2022 at 12:38
  • I agree with the above suggestion, additionally, I will say utilize the other CPU cores using the node worker thread Commented Dec 26, 2022 at 15:34
  • I did put this activity first, earlier it was taking almost 14-16 seconds now after indexing and tuning it takes around 8-9 sec and this is the exact reason I opened this thread Commented Dec 26, 2022 at 20:15

1 Answer 1

1

Denormalization is a tried and true method of speeding up reporting. As Preben suggested, creating an indexed view in SQL server is an efficient way to do this with minimal plumbing. Alternatively, it may be worth thinking about whether a data warehouse implementation is needed for future queries.

If this is a 1-off issue, put together your indexed view (pay attention to the requirements), and move on. If this is the first of many reports that you need to optimize, think about creating a more substantial solution.

Sign up to request clarification or add additional context in comments.

1 Comment

Data Warehouse & Time Series DB are 2 options I'm more leaned towards, but wanted to get as many opinions as possible.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.