2

One of my projects has a very large database on which I can't edit indexes etc., have to work as it is.

What I saw when testing some queries that I will be running on their database via a service that I am writing in .net. Is that they are quite slow when ran the first time?

What they used to do before is - they have 2 main (large) tables that are used mostly. They showed me that they open SQL Server Management Studio and run a

SELECT * 
FROM table1 
JOIN table2

a query that takes around 5 minutes to run the first time, but then takes about 30 seconds if you run it again without closing SQL Server Management Studio. What they do is they keep open SQL Server Management Studio 24/7 so that when one of their programs executes queries that are related to these 2 tables (which seems to be almost all queries ran by their program) in order to have the 30 seconds run time instead of the 5 minutes.

This happens because I assume the 2 tables get cached and then there are no (or close to none) disk reads.

Is this a good idea to have a service which then runs a query to cache these 2 tables every now and then? Or is there a better solution to this, given the fact that I can't edit indexes or split the tables, etc.?

Edit: Sorry just I was possibly unclear, the DB hopefully has indexes already, just I am not allowed to edit them or anything.

Edit 2: Query plan

4
  • Please persuade your DBA that indexing is a development task. Other than that, could you show us your query plan? Commented Sep 5, 2018 at 19:15
  • Why can't that program wait for 5 minutes? 30s does not sound like an immediate response and it's still fine for that program. Anyway, yes, there is even a specific word - warm-up for that: a service may be executed intentionally (usually at start/after restart) for the sake of filling it's caches and so on. However, you are describing svc that does not have any caches in it and is supposed to peck sqlserver from time to time and make it throw something useful away from cache and put there something that was not in demand long ago. That does not sound like a great idea. Commented Sep 5, 2018 at 19:24
  • I guess its not reasonable for them to wait as this would really slow down their workday, plus I can't really just say you should wait 5 min instead of forcing the query into cache as they do now, can I... Here is the generated query plan for the query that they use to cache what is later used by their application: brentozar.com/pastetheplan/?id=ByC5s06Dm Commented Sep 5, 2018 at 22:43
  • @Viktor1926 Why is the WHERE clause missing from your query? Do you really need all the rows (and all the columns)? If you select too many rows (as a rough rule of thumb, more than 10% of the table), SQL Server may decide that even if index exists it may not be worth using it. Also, did you consider "precomputing" the JOIN via an indexed view? Commented Sep 6, 2018 at 9:49

4 Answers 4

3

This could be a candidate for an indexed view (if you can persuade your DBA to create it!), something like:

CREATE VIEW transhead_transdata
WITH SCHEMABINDING
AS
    SELECT
        <columns of interest>
    FROM
        transhead th
        JOIN transdata td
            ON th.GID = td.HeadGID;

GO

CREATE UNIQUE CLUSTERED INDEX transjoined_uci ON transhead_transdata (<something unique>);

This will "precompute" the JOIN (and keep it in sync as transhead and transdata change).

Sign up to request clarification or add additional context in comments.

9 Comments

If all columns are of interest (so not saving on actual columns) would this provide any significant improvement to performance? From my tests running both SELECT * from transhead and SELECT * from transdata had similar run time to running the join query.
@Viktor1926 According to your query plan, Hash Match is estimated to consume 94% of the total cost, so I'm surprised that eliminating it didn't yield improvement. The JOIN overhead (such as Hash Match) is exactly what indexed view is supposed to eliminate.
@Viktor1926 Also, could you answer my question about WHERE? It's unusual to to retrieve a million and half of rows, and then have to do it repeatedly over and over again. Couldn't you just retrieve just what was changed in the meantime?
This is the query plan for running 2 selects, not sure if its of any help.. brentozar.com/pastetheplan/?id=rJMvuCpDX
@Viktor1926 The estimated cost is 15+54, while JOIN is 1225. Looks like eliminating JOIN should yield pretty spectacular improvement, no? What is your actual timing? Maybe you are waiting on some locks without knowing it? Have you tried SET TRANSACTION ISOLATION LEVEL SNAPSHOT?
|
1

You can't create indexes? This is your biggest problem regarding performance. A better solution would be to create the proper indexes and address any performance by checking wait stats, resource contention, etc... I'd start with Brent Ozar's blog and open source tools, and move forward from there.

Keeping SSMS open doesn't prevent the plan cache from being cleared. I would start with a few links.

Aside from that... that query is suspect. I wouldn't expect your application to use those results. That is, I wouldn't expect you to load every row and column from two tables into your application every time it was called. Understand that a different query on those same tables, like selecting less columns, adding a predicate, etc could and likely would cause SQL Server to generate a new query plan that was more optimized. The current query, without predicates and selecting every column... and no indexes as you stated, would simply do two table scans. Any increase in performance going forward wouldn't be because the plan was cached, but because the data was stored in memory and subsequent reads wouldn't experience physical reads. i.e. it is reading from memory versus disk.

There's a lot more that could be said, but I'll stop here.

4 Comments

I can't create indexes as this is something that is beyond my reach - they want to sell a product (a pos system), that is not written or managed by me, and/or support for pos systems. I am only writing a small tool (windows service) that will allow them to call procedures in the db automatically every x hours. These could be simple clean-up procedures or anything they want. So they asked me to include this "caching query" and leave its connection open (possibly execute it every now and then to ensure it isn't cleared from the cache) as that would make their original program work faster.
Sorry just I was possibly unclear, the db hopefully has indexes already, just I am not allowed to edit them or anything.
data was stored in memory and subsequent reads wouldn't experience physical reads. i.e. it is reading from memory versus disk I think this is where the actual problem here lies, even if I make a view from the join query, what is actually wanted - (not sure if its possible) is perhaps to keep the results in memory. Given the fact the DB is ran on 4GB or 8GB RAM machine this might be hard for me to achieve I guess...
Yes that is pretty low RAM so that’ll be hard. You can use inmemory optimized tables though
0

You might also consider putting this query into a stored procedure which can then be scheduled to run at a regular interval through SQL Agent that will keep the required pages cached.

3 Comments

Yes but that doesn't differ much from running the query itself at a regular interval via a service. Question then is will db cache be cleared when there are no active connections to the db?
It can @Viktor1926 if some system process, scheduled job, etc kicks off and needs memory which will cause SQL Server to flush plans to free up memory
So then we are going back to the initial solution - my service to run the query at some interval to ensure that even if cache was cleared it will be re-cached when I run it. Of course that doesn't guarantee that it will always be cached, just perhaps will make it more likely that it is cached when it is needed by the main application.
0

Thanks to both @scsimon @Branko Dimitrijevic for their answers I think they were really useful and the one that guided me in the right direction.

In the end it turns out that the 2 biggest issues were hardware resources (RAM, no SSD), and Auto Close feature that was set to True.

Other fixes that I have made (writing it here for anyone else that tries to improve):

  • A helper service tool will rearrange(defragment) indexes once every week and will rebuild them once a month.
  • Create a view which has all the columns from the 2 tables in question - to eliminate JOIN cost.
  • Advised that a DBA can probably help with better tables/indexes
  • Advised to improve server hardware...

Will accept @Branko Dimitrijevic 's answer as I can't accept both

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.