0

I need to figure out a way to split strings at certain character sequences into an array of strings but the problem with my current implementation is that if there are two or more splits occurring in a string, it'll overwrite the former one.

Consider splitting the following string at st and ck ...

stockwell

With regexp match I get two matches with ranges but if I try to assemble these to an array afterwards in a loop, the second match overwrites the first...

stockwell: (0,2) --> st --> _st_ockwell
stockwell: (3,2) --> ck --> sto_ck_well

What would be optimal for this is a String extension method that can split a string into a [String] at several indices but I haven't been successful yet to come up with one.

Can somebody give me some help here?

4
  • 1
    Do you have any guarantees as to the order of the separating strings? Will they possibly match multiple times or just once each? Can you guarantee that they won't have any overlap? Such as "st" and "tr", they can both be part of two difference matches in the word "strength". What is your current implementation? Commented May 19, 2015 at 16:30
  • There are no guarantees as the order or how often separating parts occur in a string. About the overlap I can add such strings like "str" to the exclusion list so that it would match at that. In fact it seems that my regexp already takes care there are no overlaps like that. The problem is how to assemble an array with the proper segments. My current implementation relies on quite a bit of external code from ExSwift and my own util extensions. i'll see if i can put a simplified example together. Commented May 19, 2015 at 16:51
  • If it were me, I'd just use componentsSeperatedByString: for each of my tokens, which builds a tree-like structure, and then simply flatten the tree. This may not be optimal, but it'd work. Commented May 19, 2015 at 16:56
  • The problem with componentsSeperatedByString is that the separating string is removed from the result. Commented May 19, 2015 at 17:09

3 Answers 3

1

How about this:

extension String {

  func multiSplit(inds: [String.Index]) -> [String] {

    if let loc = inds.last {

      return self[self.startIndex..<loc].multiSplit(Array(dropLast(inds))) + [self[loc..<self.endIndex]]

    } else {

      return [self]

    }
  }
}

let string = "abcdef"

let inds = [
  string.startIndex.successor().successor(),
  string.endIndex.predecessor().predecessor()
]

string.multiSplit(inds) // ["ab", "cd", "ef"]

The indices you give it have to be in order to work.

It's a recursive function - it takes the last index in the array of indices it's given (inds), and splits the string at that index. The first half of that split is recursively given to itself, with the last element of the array of indices removed. The "if let" fails if the index array is empty, so it just returns the string without any splits, but in an array of one.

In the example, It'll first split at the last index it's given, and get two strings: "abcd" and "ef". Then, the function calls itself with "abcd", and an array of indices without the last element (dropLast()). In this function, it'll split "abcd" into "ab" and "cd", and call itself again on "ab". But, since it passes an empty array to that function, the inds.last will fail, and it won't split anything, and just return ["ab"] (the line with return [self]). This goes up to the function above - which appends its split ("cd") onto it, and returns ["ab", "cd"]. This, finally, goes to the function above, which appends its split, and it returns the answer: ["ab, "cd", "ef"].

It's all just normal Swift, though, no imports or anything.

If you were dealing with indices out of order, you can just sort them:

let string = "abcdef"

let inds = [
  string.endIndex.predecessor().predecessor(),
  string.startIndex.successor().successor()
]

string.multiSplit(sorted(inds)) // ["ab", "cd", "ef"]

Also, if you want to remove empty strings from within the function:

extension String {

  func multiSplit(inds: [String.Index]) -> [String] {

    if let loc = inds.last {

    return (
      self[self.startIndex..<loc]
      .multiSplit(Array(dropLast(inds)))
      + [self[loc..<self.endIndex]]
      ).filter{!$0.isEmpty}

    } else {

      return [self]

    }
  }
}
Sign up to request clarification or add additional context in comments.

4 Comments

What does last() and dropLast() do? Is it something of your own code? Can you include it?
No - last() returns an optional - the last element in the array, if the array isn't empty, or nil if it is. I'll edit the answer a bit to make it clearer.
Strings don't have integer indices They have their own indices. If you want to get a String.Index from an integer, you have to use advance(someString.startIndex, intIndex)
Well I will be damned but it works! Thanks a lot for this solution! The resulting arrays contain some empty string fields if the separator was at the beginning or end but that should be easy to be filtered out again.
1

You can use lookahead and lookbehind for matching the positions:

(?=st)|(?<=st)|(?=ck)|(?<=ck)

And replace with _

See DEMO

1 Comment

Ok that's quite useful but i'm not sure how it helps my original issue, how to assemble the matches/ranges from the regexp to a string array properly. It would be great if NSRegularExpression of Swift String class had some API to split to an array via regexp (like some other languages) but it seems there's nothing like that.
1

Another way to do this, it depends on making sure that your separators don't overlap or contain each other.

import Foundation

let initial = "stockwell"
let firstSeparator = "st"
let secondSeparator = "ck"
var separated = [String]()
let firstSplit = initial.componentsSeparatedByString(firstSeparator)
for next in [firstSeparator].join(firstSplit.map{[$0]}) {
  let secondSplit = next.componentsSeparatedByString(secondSeparator)
  separated += [secondSeparator].join(secondSplit.map{[$0]})
}
separated // => ["", "st", "o", "ck", "well"]

It works by separating the string into an array using the first delimiter, inserting the delimiter into the array between elements, then doing the same to every element in the new array with the next delimiter.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.