4

what is the elegant way for finding how many times a given string is a substring of another string using scala?

The below test cases that should make clear what are the requirements:

import org.scalatest.FunSuite

class WordOccurrencesSolverTest extends FunSuite {

  private val solver = new WordOccurrencesSolver()

  test("solve for a in a") {
    assert(solver.solve("a", "a") === 1)
  }

  test("solve for b in a") {
    assert(solver.solve("b", "a") === 0)
  }

  test("solve for a in aa") {
    assert(solver.solve("a", "aa") === 2)
  }

  test("solve for b in ab") {
    assert(solver.solve("b", "ab") === 1)
  }

  test("solve for ab in ab") {
    assert(solver.solve("ab", "ab") === 1)
  }

  test("solve for ab in abab") {
    assert(solver.solve("ab", "abab") === 2)
  }

  test("solve for aa in aaa") {
    assert(solver.solve("aa", "aaa") === 2)
  }
}

Here is my solution to the problem of which I am not particularly proud:

class WordOccurrencesSolver {

  def solve(word: String, text: String): Int = {
    val n = word.length
    def solve(acc: Int, word: String, sb: String): Int = sb match {
      case _ if sb.length < n => acc
      case _ if sb.substring(0, n) == word => solve(acc + 1, word, sb.tail)
      case _ => solve(acc, word, sb.tail)
    }
    solve(0, word, text)
  }

}

I assume there must be a clean one liner which takes advantage of Scala's higher order functions instead of recursion and match/case clause.

3
  • 2
    Well... I will advise you to atay away from the temptation of finding one liners. First try to focus on the time-complexity of your solutions. Only after taking care of that should you look for one-liners or elegance. Try to think of a solution which avoids n time usage of substring (which is actually O(n)) making your solution very poor in performence. Commented Apr 10, 2017 at 12:50
  • Similarly .tail for String is also O(n) and should be avoided. Commented Apr 10, 2017 at 12:55
  • Good point, I missed it somehow. Commented Apr 10, 2017 at 13:09

3 Answers 3

14

If you are looking for an idiomatic Scala solution then you can use the sliding to create a sliding window iterator and count the wondows which are equal to your target String.

This solution while being functional also offers you an acceptable performence.

def countOccurrences(src: String, tgt: String): Int =
  src.sliding(tgt.length).count(window => window == tgt)
Sign up to request clarification or add additional context in comments.

5 Comments

Looks great. What is the time complexity for this solution?
Well... lets say that your src is of lenght m and tgt is of length n. Then step 1 - creating the window iterator is O(m). this iterator will have m-n window strings. step 2 - count involves a comparision of window string of length n with tgt string for every window string. so it is O(n*m). So the over all complextity is O(m*n)
Great, this is exactly the solution I wanted to find. I am still curious though. Are there algorithms which could achieve the same thing (not necessarily one liners) in O(n) following your notation? That is algorithms for which the length of src is irrelevant, speaking in O terms.
SubString search has been a very worked upon field and hence there are many wonderful algorithms, some of which are condition dependent where as some are generally applicable. You can read about them in detail here - www-igm.univ-mlv.fr/~lecroq/string . But be-warned this written with a mathematics centric perspective. But you can just search for simpler explanations of these algorithms on Google.
But there will not be any O(n) algorithms as that will be impossible. Even O(m) algorithms should be impossible.
2

You can probably use this java function:

StringUtils.countMatches(stringToFindMatchesIn, keyWordToFind );

This will return the number of occurences of the keyword in the string

5 Comments

I do not think OP is allowed to use that. This looks like a homework question/excercise.
StringUtils is from Apache Commons, not from a Core Java library! And it's not a Scala-only approach anyway.
@ViacheslavRodionov it is work at Scala and it's seems correct (does what is need). Also it is probably more efficient than answers upper
@MikhailIonkin please read the original question again and also the comment above mine. The sole fact that this works is not enough to propose this code as a solution.
@ViacheslavRodionov At google i found this question, and this answer helps me. It's elegant, tested (production quality) and "one-line" solution. Yes, end of this question about clear Scala solution, but I'm not sure that it is best for all.
0

If somebody else is looking for a very efficient and semi-idomatic solution, use

extension (s: String) def countSubstr(sub: String, overlap: Boolean = false): Int = {
  if (sub.isEmpty) throw IllegalArgumentException("substring must not be empty")
  val d = if (overlap) 1 else sub.length
  @tailrec
  def count(pos: Int, acc: Int): Int =
    s.indexOf(sub, pos) match {
      case -1 => acc
      case j => count(j + d, acc + 1)
    }
  count(0, 0)
}

For Scala 2 compatibility or if you do not like extensions, use

def countSubstr(s: String, sub: String, overlap: Boolean = false): Int = {
  ...

Runtime is O(s.length) for overlap = false. There are no object allocations, the @tailrec method gets optimized to a jump and the match to an if.

As your last example suggests, you wanted to allow overlaps, so it is slightly slower (but worst case is O(s.length*sub.length)).


If you are looking for a (slower, but may still be faster than sliding) one-liner this may be for you:

  def countSubstr(s: String, regex: String): Int =
    s.split(regex, -1).length - 1

Note:

  • Second argument is a regex, this may lead to unexpected results and be slower if a real regex is used.
  • It does not count overlapping substrings, so it fails on the last test.
  • The -1 as second argument to split is important. Otherwise, substrings at the end would be ignored.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.