0

I have 8 pairs of data sets.
Each pair is comprised of a 1-dimensional array with typically 400 string elements and a column with typically 2000 string elements.
Within each pair of data sets, I want to check each element of the array against each element of the column for a match.

The simplified version of my approach:

For i = 1 to 8
    For j = 0 to 400
        For k = 0 to 2000
           If Cells(k,i) = myArray[1, then 2, then 3, etc.](j) then [action]
        next K
    next j
next i

I'm looping through column A's first 2000 cells 400 times, then column B's first 2000 cells 400 times, and so on. That seems redundant and it's around 6.4 million cells checked.

The actual code. It's looking for dupes that also have a qualifying boolean. When these conditions are met, then it's taking their associated integer value and row number.
It sums all the integer values and replaces those integers with their sum.
It does this for every unique name, and then (not shown), repeats this for the remaining 7 data sets.

For i = 0 To j - 1 'starts with the first unique element and searches for dupes...
    v = 0 'initial qty value 
    r = 0 'initial number of rows with dupes
    For k = 2 To WULastRow 'go though each cell in the first column
        If Cells(k, colPair + 2) = True And Cells(k, colPair).Text = uniqueNames(i) Then '...if one is found and the row's boolean is true...
            v = v + Cells(k, colPair + 1).Value '...sum it's qty with previous dupes qty's...
            r = r + 1 'increment number of dupes found
            ReDim Preserve rngMatch(r - 1) 'increase the size of the array that will hold the row numbers of the dupes.
            rngMatch(r - 1) = k '...and input dupe's row number to said array.
        End If
    Next k

    k = 0
    
    'if 1 or more duplicate is found for the given unique item name is found...
    If r > 1 Then
        k = 0
        For k = 0 To r - 1
            Cells(rngMatch(k), colPair + 1).Value = v '...input the summed quantity in their qty cells...
        Next k
    End If
Next i 'then regardless, move on to the name unique name and repeat this process.
9
  • what's the [action]? Commented Jan 30, 2015 at 21:11
  • the [action] is a bit involved. To go a little bit past the simple code I listed, Along side each column A of 2000 names is a column B of 2000 integers and a column C of 2000 booleans. If a given row where a duplicate is found in column A and the boolean in that row is true, [action] is to take the integer in column B and sum it with all the other integers for this unique array item on other rows where a dupe was found with a "true" in column C, then replace each integer involved in that addition by the resulting sum. Does that help? Commented Jan 30, 2015 at 21:25
  • your "then replace each integer involved" is what consules time, not the size of the loop. Commented Jan 30, 2015 at 21:29
  • Show the rest of the code, there are probably a couple of other things that are taking time besides the loop. Commented Jan 30, 2015 at 21:34
  • @Gene, I don't know, I rarely find any duplicates, so I rarely replace any integers. Does that mean I'm doing unnecessary things in my code (posted above now)? Commented Jan 30, 2015 at 21:40

2 Answers 2

2

There is a large overhead every time you access a Cell(i,j)

You need to avoid this by using a single statement to get the data into a variant arry, then loop the array, then if needed put it back: here is some demo code

Option Base 1
dim vArr as variant
varr=Range("A1").resize(Lastrow,LastCol).Value2
for j=1 to LastRow
for k=1 to LastCol
if vArr(j,k) .... then Varr(j,k)= ...
next k
next j
Range("A1").resize(Lastrow,LastCol)=Varr
Sign up to request clarification or add additional context in comments.

1 Comment

Oh, very interesting, so a 2 dimensional array? I will try this (on Monday at work, now I'm going home haha). Thanks a lot!
1

What you need to do is implement some sort of variable ordering heuristic to order your elements in matrices (lowest to highest, etc.). Here's an example to get you started:

If I have one array a = [1,8,16] (ordered least to most) and another array b = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16] (again ordered) I can introduce a rule that each search iteration happens at the ending position of the last search. Here's some pseudo for what I mean:

index = 0
for i in a:
   for j in range(index, len(b)):
      if a == b:
          do something
          index = j

What this effectively does is allow you to search an iteratively smaller and smaller space each time you match a value. Hopefully this makes sense - applying a variable-ordering ranking to your b matrices may be tough though.

4 Comments

Indeed, I believe you can use something called the quick sort algorithm. The original poster also probably needs some handy computer science algorithms to make his or her problem resolve faster.
@Kyle Booth, Yes this is what I'm looking for. Whenever a duplicate is found, by definition it will not be a duplicate for the next uniqueName() element, so it should be automatically ignored on all following iterations. I think the difficulty will be in ordering the arrays (which is where all the benefits come from it seems). If I sort a dataset, it will mangle the rest of them. This is definitely a good starting point, and I'll check into that "quick sort algorithm" that bobort mentioned. Thanks guys!
@AnthonyJS while not wrong, this A ignores the elephant in the room: accessing Cells object inside the loop is orders of magnitude slower than accessing an array. Charles answer broadly addresses this, and there are many A's on SO that demonstrate the technique ( one example ). Another opportunity is that Redim Preserve is time expensive. Do it outside the loop if you can, or in large chunks inside the loop if you must ( see this example ). Only after applying these tips would I apply Kyles advise.
@chrisneilsen thanks a lot! This is exactly what I was hoping to hear, you seem very knowledgeable about what takes time and how to avoid those things. I will check out the examples you gave me and see what I can come up with.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.