0

So ok, couldn't think of a very great way of expressing this in a title but the scenario is this: -

Your building a report. The report is on an aspx page with a C# code behind which accesses T-SQL databases.

The table your getting the data from is pretty darn large (millions of rows). Two columns you need to do a lookup on another table to get (Group and SuperGroup - see below), and this lookup table just happens to be a good few tens of thousands of rows too (not to mention you actually have to join two tables to create the lookup table properly - see #partGroups below)

Now bearing in mind the page running this will time out after 2 minutes...

Heres a couple of assumptions that have to be made along the way: -

  1. The tables and their layout are immutable, regardless of design being bad or whatever, they are what they are and you have to work with them (Assets, CoreStockParts and CoreStockPartsGroups).
  2. The page timeout can NOT be altered.
  3. PartNumbers (Text01 in Assets, PartNo in CoreStockParts) could, can and do contain -'s and/or spaces in one table but not the other so they need removing.
  4. PartNumbers in Assets could and are sometimes prefixed with a character in Assets but not in CoreStockParts.

This is what I've basically got so far: -

    select rtrim(ltrim(Replace(Replace(csp.PartNo,' ',''), '-',''))) as PartNumber, 
        csp.[Description], csp.GroupCode, coalesce(cspg.[Group], 'Unknown') as [Group], coalesce(cspg.SuperGroup, 'Unknown') as SuperGroup
    into #partGroups
    from CoreStockParts as csp
        left join CoreStockPartsGroups as cspg on csp.GroupCode = cspg.Code

    select p.ID, 
        rtrim(Replace(Replace(p.Text01,' ',''), '-','')) as PartNumber1, 
        right(p.Text01, len(p.Text01)-1) as PartNumber2,
        p.Numeric01 as CostAmount, p.Numeric02 as SaleAmount, p.Numeric03 as ExtendedCostAmount, 
        p.Numeric04 as ExtendedSaleAmount, p.Numeric05 as Quantity, p.Date01 as InvoiceDate
    INTO #coreParts
    FROM Assets as p
    WHERE p.Category = 'PART'
    and len(p.Text01) > 0

    select ID, PartNumber1, PartNumber2, [Description], CostAmount, SaleAmount, ExtendedCostAmount, 
        ExtendedSaleAmount, Quantity, InvoiceDate, [Group], SuperGroup
    from #coreParts as cp
        inner join #partGroups as pg on cp.PartNumber1 = pg.PartNumber
    union
    select ID, PartNumber1, PartNumber2, [Description], CostAmount, SaleAmount, ExtendedCostAmount, 
        ExtendedSaleAmount, Quantity, InvoiceDate, [Group], SuperGroup
    from #coreParts as cp
        inner join #partGroups as pg on cp.PartNumber2 = pg.PartNumber

This is currently finishing in about 1 minute and 45 seconds with a medium server load. There are still restrictions that need adding which include but are not limited to filtering based on Group, SuperGroup and a date range based on InvoiceDate. On top of this once I finally HAVE this data I then need to start performing aggregate functions across it to produce graphs of sales quantities/values etc for various Groups/SuperGroups.

Now I'm THINKING if I can keep it to this speed.... that will do though its hardly ideal. If I can speed it up then great! Anything over 15 seconds longer however and we hit a wall however.

So the crux of this question is I guess multiple in that: -

  1. Am I missing anything obvious that I could be doing to optimize this in general?
  2. Would it be better at this point to return the results to C# and LINQ the numbers I need?
  3. I THINK if I'm filtering in the T-SQL the best places to do so would be on the select into's of the temporary tables rather than the resulting mash in the last statement?

EDIT: Ok some updates on this!

Firstly I was wrong on my assessment of what would be allowed it seems, we've got the authorization to add a snapshot table which can do all the work of getting the data we need together over night for running the report code on as and when the following day.

Special thanks to Blindy and user17594 for your input regarding indexing and bits that would prevent use of indexes. (bits, thats technical language ya know 8D).

12
  • Can you build a view and query it instead? Commented Jun 10, 2014 at 15:41
  • 4
    No it's not, @O.O.. However to really help you we'd need to see the definitions of your tables, along with the execution plan being run. You're most likely missing vital indices, or just outright refusing to use them (len(p.Text01)>0) Commented Jun 10, 2014 at 15:51
  • 3
    It's generally better to return a lot of data to C# and then let c# handle the manipulation of the data - What? I'd say the exact opposite true! Databases are specifically designed for manipulating sets of data. Commented Jun 10, 2014 at 15:52
  • 1
    Have you tried doing away with the temp table and just have one big query that returned the data. Commented Jun 10, 2014 at 15:54
  • 4
    @O.O, I get the distinct feeling you're trolling, your solution avoids the same memory space, and indeed forces network trips for huge sets of data. @TMNT2014, I think he needs them because his key is a nested scalar mess which would avoid indices. His solution is only partial anyway, he needs an index on both #coreparts and #partgroups for the final join Commented Jun 10, 2014 at 16:01

4 Answers 4

3

Instead of doing your reports in an asp page, use a background service:

  1. Create a service which waits for queries (through pipes, sockets, wcf, whateveryoulike...)

  2. When you need a report, from our asp page ask to the service if that report already exists, if not tell the service to create it.

  3. Show the user a "Loading" message and through Ajax ask your page to ask your service if the report is ready.

  4. When the service has "munched" all the data, present your reports/charts through your asp page.

Is always a good idea to sepparate long-term running processes from web pages to avoid timeouts and leaving the user with a page which looks to be hung (suppose a very nervous user who starts to press F5 because the page took 2 minutes to render... your server will crash with tons of reports being generated).

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the input @Gusman, but unfortunately that falls safely in the realm of "No, do it our way.". That and like you say, separating long term services is good - this isn't ideally meant to take that long. Probably running fine for our smaller databases but as a cover all dev is generally done on our larger customer databases as test data.
When arguing against "Do it our way" there's always weight to the fact that "your way won't work"
2

You may wish to try a few things to see which, if any, will work. Always note that with any change you need to test, test, and test some more to see if it is improving performance on your systems.

  • Use UNION ALL instead of UNION to remove the DISTINCT check that is associated with UNION (unless you know this will duplicate results)
  • Alternatively, remove your UNION and replace your join with pg.PartNumber IN (cp.PartNumber1,cp.PartNumber2)
  • Check your indices. and len(p.Text01) > 0 will prevent the use of an index seek. Try AND p.Text01 IS NOT NULL AND p.Text01 != '' instead
  • Try indexing your temp tables CREATE UNIQUE CLUSTERED INDEX IX_partGroups_tmp ON #partGroups (PartNumber) as well as CREATE NONCLUSTERED INDEX IX_coreParts_tmp ON #coreParts (PartNumber1,PartNumber2)

Try to keep your data processing within the database, as the data platform is optimized for this. Filter as much as you can and return only what you need to your client.

1 Comment

All good general points, but we still need to see how you do your final filtering (I hope you're not transferring over the entire table contents with tens of thousands of rows) and whether indexes are used or not for that.
1

Another idea is to create a Reporting Data service.

This runs over the existing data, populating a separate Database with a structure optimized for reporting, your reports run over this, which should be orders of magnitude faster, given that all the heavy lifting was done by the Reporting service.

One draw back is that the new data will be "stale" depending on how quickly changes populate into the Reporting Database. Lots of businesses won't notice or care that data is 60 or 90 seconds out of date, but it obviously depends on your own use case.

Comments

0

My approach to this would be to add computed columns to your tables:

ALTER TABLE CoreStockParts 
ADD PartNumber AS RTRIM(LTRIM((REPLACE(REPLACE(PartNo,' ',''), '-','')));

ALTER TABLE CoreParts
ADD PartNumber1 AS RTRIM(REPLACE(REPLACE(p.Text01,' ',''), '-','')),
    PartNumber2 AS RIGHT(p.Text01, LEN(p.Text01)-1);

Now that you have these computed columns you can index them:

CREATE NONCLUSTERED INDEX IX_CoreStockParts_PartNumber ON CoreStockParts (PartNumber)
    --INCLUDE ([Description], GroupCode)

Not sure of your exact table structure, but including Non-key columns in the index may assist performance, uncomment the second line if you think they will.

CREATE NONCLUSTERED INDEX IX_CoreParts_PartNumber1__part ON CoreParts (PartNumber1)
    --WHERE Category = 'PART' AND LEN(Text01) > 0

CREATE NONCLUSTERED INDEX IX_CoreParts_PartNumber2__part ON CoreParts (PartNumber2)
    --WHERE Category = 'PART' AND LEN(Text01) > 0

Depending on how you will use the table elsewhere, this index may benefit from being a filtered index, again uncomment if necessary. You may also want further indexes on CoreStockPartsGroups.Code and 'CoreStockParts.GroupCode'.

Finally I would not use temporary tables, it is usually better to avoid them as you lose the benefit of existing indexes on the source tables:

WITH PartGroups AS
(   SELECT  csp.PartNumber,
            csp.[Description], 
            csp.GroupCode, 
            ISNULL(cspg.[Group], 'Unknown') AS [Group], 
            ISNULL(cspg.SuperGroup, 'Unknown') AS SuperGroup
    FROM    CoreStockParts AS csp
            LEFT JOIN CoreStockPartsGroups AS cspg 
                ON csp.GroupCode = cspg.Code
)
SELECT  p.ID, 
        p.PartNumber1, 
        p.PartNumber2, 
        p.[Description], 
        CostAmount = p.Numeric01, 
        SaleAmount = p.Numeric02,
        ExtendedCostAmount = p.Numeric03, 
        ExtendedSaleAmount = p.Numeric04, 
        Quantity = p.Numeric05, 
        InvoiceDate = p.Date01, 
        pg.[Group], 
        pg.SuperGroup
FROM    Assets as p
        INNER JOIN partGroups AS pg 
            ON p.PartNumber1 = pg.PartNumber
WHERE   p.Category = 'PART'
AND     LEN(p.Text01) > 0 as cp
UNION 

SELECT  p.ID, 
        p.PartNumber1, 
        p.PartNumber2, 
        p.[Description], 
        CostAmount = p.Numeric01, 
        SaleAmount = p.Numeric02,
        ExtendedCostAmount = p.Numeric03, 
        ExtendedSaleAmount = p.Numeric04, 
        Quantity = p.Numeric05, 
        InvoiceDate = p.Date01, 
        pg.[Group], 
        pg.SuperGroup
FROM    Assets as p
        INNER JOIN partGroups AS pg 
            ON p.PartNumber2 = pg.PartNumber
WHERE   p.Category = 'PART'
AND     LEN(p.Text01) > 0 as cp;

Running in SSMS with display actual plan enabled will suggest futher improvement by adding indexes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.