0

I have a question about how to handle errors with holistic SQL queries. We are using Oracle PL/SQL. Most of our codebase is row-by-row processing which ends up in extremely poor performance. As far as I understand the biggest problem with that is the context switches between the PL/SQL and SQL engine.

Problem with that is that the user don't know what went wrong. Old style would be like:

  • Cursor above some data
  • Fire SELECT (count) in another table if data exists, if not show errormsg
  • SELECT that data
  • Fire second SELECT (count) in another table if data exists, if not show errormsg
  • SELECT that data
  • modify some other table

And that could go on for 10-20 tables. It's basically pretty much like a C program. It's possible to remodel that to something like:

UPDATE (
  SELECT TAB1.Status,
         10 AS New_Status
  FROM   TAB1
  INNER  JOIN TAB2 ON TAB1.FieldX = TAB2.FieldX
  INNER  ..
  INNER  ..
  INNER  ..
  INNER  ..
  LEFT   ..
  LEFT   ..
  WHERE  TAB1.FieldY = 2
  AND    TAB3.FieldA = 'ABC'
  AND    ..
  AND    ..
  AND    ..
  AND    ..
) TAB
SET   TAB.Status = New_Status
WHERE TAB.Status = 5;

A holistic SELECT like that would speed up a lot of things extremely. I changed some queries like that and that stuff went down from 5 hours to 3 minutes but that was kinda easy because it was a service without human interaction.

Question is how would you handle stuff like that were someone fills some form and waits for a response. So if something went wrong they need an errormsg. Only solution that came to my mind was checking if rows were updated and if not jump into another code section that still does all the single selects to determinate that error. But after every change we would have to update the holistic select and all single selects. Guess after some time they would differ and lead to more problems.

Another solution would be a generic errormsg which would lead to hundred calls a day and us replacing 50 variables into the query, kill some of the where conditions/joins to find out what condition filtered away the needed rows.

So what is the right approach here to get performance and still be kinda user friendly. At the moment our system feels unusable slow. If you press a button you often have to wait a long time (typically 3-10 seconds, on some more complex tasks 5 minutes).

2
  • So, in short, you're saying that you have some complex update statements that the application runs, based on user input, that may update 0 rows, and in that case, you need to return an error to the user and indicate where the problem lies in the data they supplied? Sounds like you maybe have some sort of validation issue in your application, if the user is allowed to input information that doesn't exist. Do foreign keys exist between the tables? How is the information cached on the application side? What kind of data and application are we talking? Commented Jan 24, 2017 at 13:44
  • We are talking about a warehouse management system. Let's imagine the user wants to put a pallet into a storage bin he just scanned the barcode for. We need to check if that storage bin is locked, if that pallet type is allowed in that type of storage bin, if only certain articles are allowed on that storage bin, how many pallets fit on that storage bin etc. etc. So there are a lot of realtime data to be checked because many other people working with the same data, so state is important. Commented Jan 24, 2017 at 13:54

1 Answer 1

1

Set-based operations are faster than row-based operations for large amounts of data. But set-based operations mostly apply to batch tasks. UI tasks usually deal with small amounts of data in a row by row fashion.

So it seems your real aim should be understanding why your individual statements take so long.

" If you press a button you often have to wait a long time (typically 3-10 seconds on some complexer tasks 5 minutes"

That's clearly unacceptable. Equally clearly it's not possible for us to explain it: we don't have the access or the domain knowledge to diagnose systemic performance issues. Probably you need to persuade your boss to spring for a couple of days of on-site consultancy.

But here is one avenue to explore: locking.

"many other people working with the same data, so state is important"

Maybe your problems aren't due to slow queries, but to update statements waiting on shared resources? If so, a better (i.e. pessimistic) locking strategy could help.


"That's why I say people don't need to know more"

Data structures determine algorithms. The particular nature of your business domain and the way its data is stored is key to writing performative code. Why are there twenty tables involved in a search? Why does it take so long to run queries on these tables? Is STORAGE_BIN_ID not a primary key on all those tables?

Alternatively, why are users scanning barcodes on individual bins until they find one they want? It seems like it would be more efficient for them to specify criteria for a bin, then a set-based query could allocate the match nearest to their location.

Or perhaps you are trying to write one query to solve multiple use cases?

Sign up to request clarification or add additional context in comments.

5 Comments

Locking isn't the problem. Let's imagine someone wants to create a transport into an area without specificing the exact storage bin. That cursor might loop above 50k storage bins and checking for each inside 20 different tables until it found one that works. Might happen pretty fast or after 2000 rounds inside that cursor so after 40000 fast queries.
Let's imagine I know absolutely nothing about your business or your data architecture. Which is my main point. I don't know. Nobody else on SO knows either. This is a problem which it way beyond the capabilities of SO to answer.
Nobody has to know anything more. The problem should be pretty clear. If I batch proccess big amounts of data it's fast but I don't know how to tell the user what went wrong. If I proccess the data row-by-row it's extremely slow but I know the exact problem and can display that. People don't need other information unless they suspect the problem somewhere else. It's just many variables of stateful data that has to be checked, no way around it. So the question is if there is any better way than row-by-row or batch processing combined with row-by-row (only for errors).
@aLpenbog you only think nobody needs to know any more because you (presumably) know your database and application. We don't. APC is suggesting (and I agree) that the answer is probably not as simple as you think it is. For a start, if you're looking for the first available storage bin, why not write a single query that queries all the relevant tables, and if it doesn't find a bin then there's no bin availalble. That's not a case of digging into the individual queries to work out why there isn't a bin available, is it? You either have an available bin or you don't.
yes that would be the situation where the client would call us and tell us there are plenty of free bins and we would have to look whats wrong to tell him that his product is not allowed to be mixed with other products or it is to heavier for the only bins left etc. and that is only one example for thousand of different interactions. That's why I say people don't need to know more my problem is still row-by-row is slow but I know why rows are filterered out and can output something to the user while I can't do that with fast batch processing.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.