I'm processing a csv file with last column not being always same format. Each row has this structure:
"Root/Word1","some string","some string","some œ0'fqw[唃#”≠§
\nfw@\tfa0j
"
"Root/Word2","some string","some string","some string"
...
So 6 columns and the last can contain \n. Which makes it hard to split by components. Another restriction is that all the strings can be any possible special character. Which makes it hard to use regex.
I decided to solve the problem brute force first. (Yes I have seen that index offset by is O(n). But can't come up with an alternative.)
static func importData(_ db: DB) {
let csvString = readDataFromCSV(fileName: "data", fileType: "csv")!
let totalCharCount = csvString.count
print("total: \(totalCharCount)")
for i in 0..<totalCharCount {
print(i)
if i+5 >= totalCharCount {
continue
}
let index = csvString.index(csvString.startIndex, offsetBy: i)
let endIndex = csvString.index(csvString.startIndex, offsetBy:i+5)
let part = csvString[index ..< endIndex]
if part == "Root/" {
let accum = lookInside(i: i, totalCharCount: totalCharCount, csvString: csvString)
var rows = accum.components(separatedBy: "\",\"")
if var lastt = rows.last {
lastt.removeLast()
lastt.removeLast()
rows[rows.count-1] = lastt
}
}
}
}
static func lookInside(i:Int, totalCharCount: Int, csvString: String) -> String {
var accum = ""
var found = false
var j = i+5
while !found {
if j+5 >= totalCharCount {
found = true
}
let index2 = csvString.index(csvString.startIndex, offsetBy: j)
let endIndex2 = csvString.index(csvString.startIndex, offsetBy:j+5)
if csvString[index2 ..< endIndex2] == "Root/" {
found = true
accum.removeLast()
} else {
accum += String(csvString[index2])
}
j += 1
}
return accum
}
Basically I'm traversing the whole string looking for a pattern "Root/". When found, I advance from this moment to the next occurrence of the pattern.
Problem is that the csv results in a string 200k chars long and when I run this on simulator it lasts too much time (~30min).
So now I'm asking some help here because according to Instruments all the time is consumed in String.index(offset by) method which is called too many times.
O(n)and since you begin fromcsvString.startIndexevery time, it quickly becomes anO(n^2)operation. Since you question is about performance optimization, please upload the full CSV file to Pastebin and include a link here"? Is it legal to quote a double-quote (i.e\")? Or is it true that there will be precisely 8 (12?) double-quotes per record, and the end of a record will be a double-quote? Is removingRoot/important, or just how you're finding the records?