Parsing sentences using Scala parser combinator

Question

I just started playing with parser combinators in Scala, but got stuck on a parser to parse sentences such as "I like Scala." (words end on a whitespace or a period (.)).

I started with the following implementation:

package example

import scala.util.parsing.combinator._

object Example extends RegexParsers {
  override def skipWhitespace = false

  def character: Parser[String] = """\w""".r

  def word: Parser[String] =
    rep(character) <~ (whiteSpace | guard(literal("."))) ^^ (_.mkString(""))

  def sentence: Parser[List[String]] = rep(word) <~ "."
}

object Test extends App {
  val result = Example.parseAll(Example.sentence, "I like Scala.")

  println(result)
}

The idea behind using guard() is to have a period demarcate word endings, but not consume it so that sentences can. However, the parser gets stuck (adding log() reveals that it is repeatedly trying the word and character parser).

If I change the word and sentence definitions as follows, it parses the sentence, but the grammar description doesn't look right and will not work if I try to add parser for paragraph (rep(sentence)) etc.

def word: Parser[String] =
  rep(character) <~ (whiteSpace | literal(".")) ^^ (_.mkString(""))

def sentence: Parser[List[String]] = rep(word) <~ opt(".")

Any ideas what may be going on here?

DaoWen · Accepted Answer · 2014-01-25 21:34:59Z

2

However, the parser gets stuck (adding log() reveals that it is repeatedly trying the word and character parser).

The rep combinator corresponds to a * in perl-style regex notation. This means it matches zero or more characters. I think you want it to match one or more characters. Changing that to a rep1 (corresponding to + in perl-style regex notation) should fix the problem.

However, your definition still seems a little verbose to me. Why are you parsing individual characters instead of just using \w+ as the pattern for a word? Here's how I'd write it:

object Example extends RegexParsers {
  override def skipWhitespace = false

  def word: Parser[String] = """\w+""".r

  def sentence: Parser[List[String]] = rep1sep(word, whiteSpace) <~ "."
}

Notice that I use rep1sep to parse a non-empty list of words separated by whitespace. There's a repsep combinator as well, but I think you'd want at least one word per sentence.

edited Jan 25, 2014 at 21:34

answered Jan 25, 2014 at 21:27

DaoWen

33.1k6 gold badges77 silver badges102 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ramnivas Over a year ago

Thanks. As for simplifying word, you are right that in the example, your solution makes more sense. The original problem I was trying to solve has a bit more complex domain, where the equivalent of character is a bit more complex and requires specifying its own parser.

Collectives™ on Stack Overflow

Parsing sentences using Scala parser combinator

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related