3
\$\begingroup\$

I'm trying to split an incoming stream of strings into cumulative tokens per line item using a function below,

def cumulativeTokenise(string: String): Array[String] = {
  val array = string.split(" +")
  var result: Array[String] = Array()
  array.map { i => (
                    result = result :+ (
                                       if (result.lastOption == None) i 
                                       else result.lastOption.getOrElse("")+ " " + i
                                       ) 
                   )
             }
  result
}

Ex: output of cumulativeTokenise("TEST VALUE DESCRIPTION . AS") would be => Array(TEST, TEST VALUE, TEST VALUE DESCRIPTION, TEST VALUE DESCRIPTION ., TEST VALUE DESCRIPTION . AS)

Trying to figure out if there's another efficient in-built method in Scala or better ways of doing it with FP, without any mutable array. Any help is much appreciated.

enter image description here

\$\endgroup\$
1
  • \$\begingroup\$ Have you tried scanLeft where the initial parameter is an empty list? \$\endgroup\$ Commented Apr 22, 2020 at 12:53

1 Answer 1

1
\$\begingroup\$

You can get the same results a little more directly.

def cumulativeTokenise(string: String): Array[String] =
  string.split("\\s+")
        .inits
        .map(_.mkString(" "))
        .toArray
        .reverse
        .tail

Or a, perhaps simpler, two step procedure.

def cumulativeTokenise(string: String): Array[String] = {
  val tokens = string.split("\\s+")
  Array.tabulate(tokens.length)(n => tokens.take(n+1).mkString(" "))
}

One problem I see here is that you rely on whitespace to separate tokens. That might not always be the case.

def cumulativeTokenise(string: String): Array[String] =
  string.split("((?=\\W)|(?<=\\W))")
        .filter(_.trim.nonEmpty)
        .inits
        .map(_.mkString(" "))
        .toArray
        .reverse
        .tail

cumulativeTokenise("here@there")
//res0: Array[String] = Array(here, here @, here @ there)

Probably not the best solution to the problem, but it's something to think about.

\$\endgroup\$
1
  • \$\begingroup\$ I like the Array tabulate approach, I'm using whitespace because that was the requirement given to me. Thanks jwvh \$\endgroup\$ Commented Apr 23, 2020 at 13:51

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.