I made a script that works properly (does what I want it to), however, it's painfully slow and at this pace, it will finish in about 20 days. I can't wait for 20 days and I'm not good enough at this to make it faster on my own.
Here's a brief description of the task: Masterlist - it's a sheet with 23 columns and 29000+ rows. Seed - it's an empty sheet that I'm to copy the Masterlist to. Duplicates - it's an empty sheet where I will store any duplicate rows.
The process: Get the first line from Masterlist. Check if line already in Seed. If line not in Seed, add line. If line already in Seed, add line to Duplicates. Either way, delete the original line from the Masterlist.
The definition of duplicate: Each line has an emails column. Column can be either a single email address, or multiple email addresses separated by "; ". If an email is found within line in Masterlist and already exists within line in Seed, this whole line is considered a duplicate.
Example:
"[email protected]" is not a duplicate of "[email protected]; [email protected]"
"[email protected]" is a duplicate of "[email protected]; [email protected]"
Furthermore, if the emails cell is empty in the Masterlist, this is not considered a duplicate.
Here comes my code - it works but is not fast enough.
function getSheet(name){
var sheet = SpreadsheetApp.getActiveSpreadsheet().getSheetByName(name);
return sheet;
}
function getRowByID(sheet, rowID) {
var range = sheet.getRange(rowID, 1, 1, 23);
var value = range.getValues();
return [range, value];
}
//main executes the entire thing
function main(){
var sourceSheet = getSheet('Masterlist');
var targetSheet = getSheet('Seed');
var remainingSheet = getSheet('Duplicates');
var counter = sourceSheet.getLastRow();
var start = new Date();
while(counter >= 2){
var sourceLine = getRowByID(sourceSheet, 2)[1];
var duplicates = checkEmailMatch(sourceLine, targetSheet);
if(duplicates == 0){
targetSheet.appendRow(sourceLine[0]);
sourceSheet.deleteRow(2);
}
else{
remainingSheet.appendRow(sourceLine[0]);
sourceSheet.deleteRow(2);
}
counter--;
}
}
//iterates through existing lines in the Seed sheet (locates the email cell and reads its contents)
function checkEmailMatch(row, seed){
var sourceEmail = row[0][7];
var counter = seed.getLastRow();
var result = [];
if(!counter){
return 0;
}
else{
var j = 0;
var i = 2;
for(i; i <= counter; i++){
var seedLine = getRowByID(seed, i)[1];
var seedEmail = seedLine[0][7];
if(!seedEmail){}
else if(compareEmails(seedEmail, sourceEmail) == true) {
result[j] = i;
j++;
}
}
return result;
}
}
//Compares each email in Masterlist ("; " separated) with each email in Source ("; " separated)
function compareEmails(emailSeedCell, emailSourceCell){
var seedEmails = emailSeedCell.split("; ");
var sourceEmails = emailSourceCell.split("; ");
for(var i = 0; i < seedEmails.length; i++){
for(var j = 0; j < sourceEmails.length; j++){
if(seedEmails[i] == sourceEmails[j]) return true;
}
}
return false;
}
Please help me - if you need any additional info, I'd be happy to provide! Please note that this is my third script ever, so any feedback is welcome!
timeeach part of your script to figure out the time consuming part?getRow()on every loop iteration. With your volume of data that's pretty bad. Read through the best practices..appendRow(),deleteRow(),.getLastRow()inside a loop, every "non-batch" operation adds to the costs. Avoid touching the spreadsheet with any calls until all manipulations to the data are complete.