4

I’m afraid this is another noob question.

What I want to do is to use a Map in order to count how often a word appears in a poe…m and then print the results to the console. I went to the following code which I believe is working (while probably not quite idiomatic):

val poe_m="""Once upon a midnight dreary, while I pondered weak and weary,
            |Over many a quaint and curious volume of forgotten lore,
            |While I nodded, nearly napping, suddenly there came a tapping,
            |As of some one gently rapping, rapping at my chamber door.
            |`'Tis some visitor,' I muttered, `tapping at my chamber door -
            |Only this, and nothing more.'"""

val separators=Array(' ',',','.','-','\n','\'','`')
var words=new collection.immutable.HashMap[String,Int]
for(word<-poe_m.stripMargin.split(separators) if(!word.isEmpty))  
    words=words+(word.toLowerCase -> (words.getOrElse(word.toLowerCase,0)+1))

words.foreach(entry=>println("Word : "+entry._1+" count : "+entry._2))

As far as I understand, in Scala, immutable data structures are preferred to mutable ones and val preferable to varso I’m facing a dilemma : words should be a var (allowing a new instance of map to be used for each iteration) if results are to be stored in an immutable Map while turning words into a val implies to use a mutable Map.

Could someone enlighten me about the proper way to deal with this existential problem?

1
  • 1
    You should use "\\W+" to split. It includes everything that is not a letter or a number. If you want to discard numbers as well, "\\P{Alpha}+" will do that. Commented Apr 26, 2012 at 16:28

5 Answers 5

10

In this case you can use groupBy and mapValues:

val tokens = poe_m.stripMargin.split(separators).filterNot(_.isEmpty)
val words = tokens.groupBy(w => w).mapValues(_.size)

More generally this is a job for a fold:

 val words = tokens.foldLeft(Map.empty[String, Int]) {
   case (m, t) => m.updated(t, m.getOrElse(t, 0) + 1)
 }

The Wikipedia entry on folds gives some good clarifying examples.

Sign up to request clarification or add additional context in comments.

3 Comments

Great ! Exactly the kind of answer i love : a concrete solution to my problem and new territories to explore.
instead of w => w there can be used identity
Right—I was using the explicit identity function for clarity, but it's worth noting that identity exists.
2

Well, in functional programming it is preferred to use some immutable objects and to use functions to update them (for example a tail recursive function returning the updated map). However, if you are not dealing with heavy loads, you should prefer the mutable map to the use of var, not because it is more powerful (even if I think it should be) but because it is easier to use.

Finally the answer of Travis Brown is a solution for your concrete problem, mine is more a personal philosophy.

1 Comment

Thanks for your point of view, since I was looking as much for a pilosophy than for a solution to this very problem.
2

I am a noob with Scala too, so, there may be better ways to do it. I have come up with the following:

poe_m.stripMargin.split(separators)
     .filter(x => !x.isEmpty)
     .groupBy(x => x).foreach {
        case(w,ws) => println(w + " " + ws.size)
     }

By applying successive functions, you avoid the need for vars and mutables

1 Comment

As far as I know it is preferable way to write groupBy(identity) instead of groupBy(x => x)
1

This is how this is done in the very good book "Programming in Scala: A Comprehensive Step-by-Step Guide, 2nd Edition" by Martin Odersky:

def countWords(text: String) = {
  val counts = mutable.Map.empty[String, Int]
  for (rawWord <- text.split("[ ,!.]+")) {
    val word = rawWord.toLowerCase
    val oldCount = 
      if (counts.contains(word)) counts(word)
      else 0
    counts += (word -> (oldCount + 1))
  }
  counts
}

However, it also uses an mutable Map.

Comments

1

Credit lies elsewhere (Travis and Daniel in particular) for what follows but there was a simpler one liner needing to get out.

val words = poe_m split "\\W+" groupBy identity mapValues {_.size}

There's a simplification in that you won't need stripMargin because the regex, as suggested by Daniel disposes of the margin characters as well.

You could retain the _.isEmpty filtering to protect against the edge case for the empty String which yields ("" -> 1) if you want.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.