So ok, couldn't think of a very great way of expressing this in a title but the scenario is this: -
Your building a report. The report is on an aspx page with a C# code behind which accesses T-SQL databases.
The table your getting the data from is pretty darn large (millions of rows). Two columns you need to do a lookup on another table to get (Group and SuperGroup - see below), and this lookup table just happens to be a good few tens of thousands of rows too (not to mention you actually have to join two tables to create the lookup table properly - see #partGroups below)
Now bearing in mind the page running this will time out after 2 minutes...
Heres a couple of assumptions that have to be made along the way: -
- The tables and their layout are immutable, regardless of design being bad or whatever, they are what they are and you have to work with them (Assets, CoreStockParts and CoreStockPartsGroups).
- The page timeout can NOT be altered.
- PartNumbers (Text01 in Assets, PartNo in CoreStockParts) could, can and do contain -'s and/or spaces in one table but not the other so they need removing.
- PartNumbers in Assets could and are sometimes prefixed with a character in Assets but not in CoreStockParts.
This is what I've basically got so far: -
select rtrim(ltrim(Replace(Replace(csp.PartNo,' ',''), '-',''))) as PartNumber,
csp.[Description], csp.GroupCode, coalesce(cspg.[Group], 'Unknown') as [Group], coalesce(cspg.SuperGroup, 'Unknown') as SuperGroup
into #partGroups
from CoreStockParts as csp
left join CoreStockPartsGroups as cspg on csp.GroupCode = cspg.Code
select p.ID,
rtrim(Replace(Replace(p.Text01,' ',''), '-','')) as PartNumber1,
right(p.Text01, len(p.Text01)-1) as PartNumber2,
p.Numeric01 as CostAmount, p.Numeric02 as SaleAmount, p.Numeric03 as ExtendedCostAmount,
p.Numeric04 as ExtendedSaleAmount, p.Numeric05 as Quantity, p.Date01 as InvoiceDate
INTO #coreParts
FROM Assets as p
WHERE p.Category = 'PART'
and len(p.Text01) > 0
select ID, PartNumber1, PartNumber2, [Description], CostAmount, SaleAmount, ExtendedCostAmount,
ExtendedSaleAmount, Quantity, InvoiceDate, [Group], SuperGroup
from #coreParts as cp
inner join #partGroups as pg on cp.PartNumber1 = pg.PartNumber
union
select ID, PartNumber1, PartNumber2, [Description], CostAmount, SaleAmount, ExtendedCostAmount,
ExtendedSaleAmount, Quantity, InvoiceDate, [Group], SuperGroup
from #coreParts as cp
inner join #partGroups as pg on cp.PartNumber2 = pg.PartNumber
This is currently finishing in about 1 minute and 45 seconds with a medium server load. There are still restrictions that need adding which include but are not limited to filtering based on Group, SuperGroup and a date range based on InvoiceDate. On top of this once I finally HAVE this data I then need to start performing aggregate functions across it to produce graphs of sales quantities/values etc for various Groups/SuperGroups.
Now I'm THINKING if I can keep it to this speed.... that will do though its hardly ideal. If I can speed it up then great! Anything over 15 seconds longer however and we hit a wall however.
So the crux of this question is I guess multiple in that: -
- Am I missing anything obvious that I could be doing to optimize this in general?
- Would it be better at this point to return the results to C# and LINQ the numbers I need?
- I THINK if I'm filtering in the T-SQL the best places to do so would be on the select into's of the temporary tables rather than the resulting mash in the last statement?
EDIT: Ok some updates on this!
Firstly I was wrong on my assessment of what would be allowed it seems, we've got the authorization to add a snapshot table which can do all the work of getting the data we need together over night for running the report code on as and when the following day.
Special thanks to Blindy and user17594 for your input regarding indexing and bits that would prevent use of indexes. (bits, thats technical language ya know 8D).
len(p.Text01)>0)#corepartsand#partgroupsfor the final join