1

We're about to start a new process. It will involve over a fifty tables with a total of more than 2 million rows. The process will loop in a For/For each box. Inside, the tables will suffer different procceses. Basicaly updates (the most called will be a delete looking for duplicates). In the end we'll get a new table with the full content of all those 50 tables with all the updates done.

So my question is: Is it better, in terms of speed, to look for duplicates in the tables during every loop. Or is it better to do a full delete at the end of the process checking the full result? The amount of work will be more or less the same.

Thanks a lot.

EDIT.

More info.

The loop is kind of needed. Those 50 tables are located in two different servers. Oracle and Access. The loop is to populate them in the SQL server, the local one. On every population (loop) I do some updates and work on the tables so they are ready.

The question is if the work we do on the tables is faster if its ran into the loop or outside it.

Thanks, hope it gets clearer.

4
  • it is better to do things in sets without looping at all. Commented Apr 17, 2012 at 14:00
  • @HLGEM - Always? I have found that it can sometimes be a good idea to break a large write into smaller writes to save transaction log overheads, etc. Commented Apr 17, 2012 at 14:03
  • @Dems Not always but virtually always when you are looping one record at a time. If you are looping thorugh batches that is often a good idea. Commented Apr 17, 2012 at 15:59
  • 2 million rows is chump-change to most modern databases - potentially even the in-memory ones, given modern systems. A slightly more in-depth description would be helpful, but this can probably be done without any looping (or de-normalization). Although @Dems point about transaction logs is good. Commented Apr 17, 2012 at 16:01

1 Answer 1

1

sounds like a single statement

also, why denormalizing prematurely?

Sign up to request clarification or add additional context in comments.

2 Comments

I tried a small-replica of what it will be once the full process is done. Just using 2 tables. 1st deleting inside the loop. It took 3.5 seconds. 2nd after the loop and it took 4.2 seconds. I dont know if it's some random time because of the random tables I chose or if that's some sort of indication. After the 50 tables are involved it could take several minutes, so Id like to start doing the denormalizing right away, if possible of course
it feels to me that you should not be writing a loop... that should all be done by the sql statement. perhaps edit the question to show some example tables, data and desired results - then you can get a good sql statement to compare to.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.