0

I made a script that works properly (does what I want it to), however, it's painfully slow and at this pace, it will finish in about 20 days. I can't wait for 20 days and I'm not good enough at this to make it faster on my own.

Here's a brief description of the task: Masterlist - it's a sheet with 23 columns and 29000+ rows. Seed - it's an empty sheet that I'm to copy the Masterlist to. Duplicates - it's an empty sheet where I will store any duplicate rows.

The process: Get the first line from Masterlist. Check if line already in Seed. If line not in Seed, add line. If line already in Seed, add line to Duplicates. Either way, delete the original line from the Masterlist.

The definition of duplicate: Each line has an emails column. Column can be either a single email address, or multiple email addresses separated by "; ". If an email is found within line in Masterlist and already exists within line in Seed, this whole line is considered a duplicate.

Example:

"[email protected]" is not a duplicate of "[email protected]; [email protected]"

"[email protected]" is a duplicate of "[email protected]; [email protected]"

Furthermore, if the emails cell is empty in the Masterlist, this is not considered a duplicate.

Here comes my code - it works but is not fast enough.

function getSheet(name){
  var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(name);
  return sheet;
}

function getRowByID(sheet, rowID) {
  var range = sheet.getRange(rowID, 1, 1, 23);
  var value = range.getValues();
  return [range, value];
}

//main executes the entire thing
function main(){
  var sourceSheet = getSheet('Masterlist');
  var targetSheet = getSheet('Seed');
  var remainingSheet = getSheet('Duplicates');
  var counter = sourceSheet.getLastRow();
  var start = new Date();

  while(counter >= 2){
    var sourceLine = getRowByID(sourceSheet, 2)[1];
    var duplicates = checkEmailMatch(sourceLine, targetSheet);

    if(duplicates == 0){
      targetSheet.appendRow(sourceLine[0]);
      sourceSheet.deleteRow(2);
    }
    else{
      remainingSheet.appendRow(sourceLine[0]);
      sourceSheet.deleteRow(2);
    }
    counter--;
  }
}

//iterates through existing lines in the Seed sheet (locates the email cell and reads its contents)
function checkEmailMatch(row, seed){
  var sourceEmail = row[0][7];
  var counter = seed.getLastRow();
  var result = [];

  if(!counter){
    return 0;
  }
  else{
    var j = 0;
    var i = 2;
    for(i; i <= counter; i++){
      var seedLine = getRowByID(seed, i)[1];
      var seedEmail = seedLine[0][7];
      if(!seedEmail){}
      else if(compareEmails(seedEmail, sourceEmail) == true) {
        result[j] = i; 
        j++;
      }
  }
  return result;
}
}

//Compares each email in Masterlist ("; " separated) with each email in Source ("; " separated) 
function compareEmails(emailSeedCell, emailSourceCell){
  var seedEmails = emailSeedCell.split("; ");
  var sourceEmails = emailSourceCell.split("; ");
  for(var i = 0; i < seedEmails.length; i++){
    for(var j = 0; j < sourceEmails.length; j++){
      if(seedEmails[i] == sourceEmails[j]) return true;
    }
  }
  return false;
}

Please help me - if you need any additional info, I'd be happy to provide! Please note that this is my third script ever, so any feedback is welcome!

6
  • Some reasonable research is expected of SO users. Have you already read "Best practices" in official documentation or tried to time each part of your script to figure out the time consuming part? Commented Oct 13, 2021 at 15:50
  • You're making a typical mistake, you're making an API call with getRow() on every loop iteration. With your volume of data that's pretty bad. Read through the best practices. Commented Oct 13, 2021 at 16:18
  • Thank you @TheMaster @Dmitry! I have managed to remove the getRow() from almost everywhere now. I am currently struggling to figure out how to not to call it when I'm appending the line, but even like this, it works about 100 times faster! I figure it will take hours to finish which is way better than what I had in the first place! Thank you so much! Please note this is my third script so I'm sorry if you thought this question was stupid. Let's say I'm learning how to do research and hopefully I'll do a better job next time! Commented Oct 13, 2021 at 17:03
  • 2
    The code in your question was reasonable. I can remember writing my code like that. But if you can get to the point of capturing all of your data in 2 dimensional array's and operate on them all at one time and save the final results with setValues() you will find that apps script is fairly fast. Note deleting lines one at a time is slow. Commented Oct 13, 2021 at 17:44
  • If you know arrays, that's the way to go. Repeated calls to .appendRow(), deleteRow(), .getLastRow() inside a loop, every "non-batch" operation adds to the costs. Avoid touching the spreadsheet with any calls until all manipulations to the data are complete. Commented Oct 13, 2021 at 17:56

1 Answer 1

1

Thanks to everyone who chipped in to help, I managed to come up with this code that reduced the execution time more than 10000 times! Thanks, everyone - here's the code:

function sheetToArray(name){
  var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(name);
  var counter = sheet.getLastRow();
  var columns = sheet.getLastColumn();
  var array = sheet.getRange(2, 1, counter, columns).getValues();
  return array;
}

function compareEmails(emailSeedCell, emailSourceCell){
  var seedEmails = emailSeedCell.split("; ");
  var sourceEmails = emailSourceCell.split("; ");
  var result = false;
  for(var i = 0; i < seedEmails.length; i++){
    for(var j = 0; j < sourceEmails.length; j++){
      if(seedEmails[i] == sourceEmails[j]) result = true;
    }
  }
  return result;
}

function save2DArrayToSpreadsheet(name, array){
  var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(name);
  sheet.getRange(2, 1, array.length, array[0].length).setValues(array);
}

function main(){
  var masterArray = sheetToArray('Masterlist');
  var seedArray = [];
  var duplicateArray = [];

  for(var i = 0; i < masterArray.length; i++){
    Logger.log(i);
    if(!seedArray.length){
      seedArray.push(masterArray[i]);
    }
    else if(!masterArray[i][7]){
      seedArray.push(masterArray[i]);
    }
    else{
      var result = false;
      for(var j = 0; j < seedArray.length; j++){
        if(compareEmails(seedArray[j][7], masterArray[i][7]) == true){
          result = true;
        }
      }
      if(result == true){
          duplicateArray.push(masterArray[i]);
        }
        else{
          seedArray.push(masterArray[i]);
      }
    }
  }
  
  save2DArrayToSpreadsheet("Seed", seedArray);
  save2DArrayToSpreadsheet("Duplicates", duplicateArray);
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.