3

I am trying to parse a file that looks like:

a b c 
f e d

I want to match each of the symbols in the line and parse everything into a list of lists such as:

[[A, B, C], [D, E, F]]

In order to do that I tried the following:

import           Control.Monad
import           Text.ParserCombinators.Parsec
import           Text.ParserCombinators.Parsec.Language
import qualified Text.ParserCombinators.Parsec.Token    as P

parserP :: Parser [[MyType]]
parserP = do
  x  <- rowP
  xs <- many (newline >> rowP)
  return (x : xs)

rowP :: Parser [MyType]
rowP = manyTill cellP $ void newline <|> eof

cellP :: Parser (Cell Color)
cellP = aP <|> bP <|> ... -- rest of the parsers, they all look very similar

aP :: Parser MyType
aP = symbol "a" >> return A

bP :: Parser MyType
bP = symbol "b" >> return B

lexer = P.makeTokenParser emptyDef
symbol  = P.symbol lexer

But it fails to return multiple inner lists. Instead what I get is:

[[A, B, C, D, E, F]]

What am I doing wrong? I was expecting manyTill to parse cellP until the newline character, but that's not the case.

3
  • 2
    Is there a reason you can't post cellP? It's possible with Parsec that cellP is too eager and eating input, so it might help if we saw that implementation. Commented Jun 29, 2017 at 20:53
  • There is no need to use parser for that, are you using it for some particular reason? Commented Jun 29, 2017 at 21:00
  • I have added the code for cellP @SilvioMayolo . There is no particular reason to use parsec, I just thought it would make the code look more declarative. Commented Jun 29, 2017 at 21:11

2 Answers 2

5

Parser combinators are overkill for something this simple. I'd use lines :: String -> [String] and words :: String -> [String] to break up the input and then map the individual tokens into MyTypes.

toMyType :: String -> Maybe MyType
toMyType "a" = Just A
toMyType "b" = Just B
toMyType "c" = Just C
toMyType _ = Nothing

parseMyType :: String -> Maybe [[MyType]]
parseMyType = traverse (traverse toMyType) . fmap words . lines
Sign up to request clarification or add additional context in comments.

Comments

4

You're right that manyTill keeps parsing until a newline. But manyTill never gets to see the newline because cellP is too eager. cellP ends up calling P.symbol, whose documentation states

symbol :: String -> ParsecT s u m String

Lexeme parser symbol s parses string s and skips trailing white space.

The keyword there is 'white space'. It turns out, Parsec defines whitespace as being any character which satisfies isSpace, which includes newlines. So P.symbol is happily consuming the c, followed by the space and the newline, and then manyTill looks and doesn't see a newline because it's already been consumed.

If you want to drop the Parsec routine, go with Benjamin's solution. But if you're determined to stick with that, the basic idea is that you want to modify the language's whiteSpace field to correctly define whitespace to not be newlines. Something like

lexer = let lexer0 = P.makeTokenParser emptyDef
        in lexer0 { whiteSpace = void $ many (oneOf " \t") }

That's pseudocode and probably won't work for your specific case, but the idea is there. You want to change the definition of whiteSpace to be whatever you want to define as whiteSpace, not what the system defines by default. Note that changing this will also break your comment syntax, if you have one defined, since whiteSpace was previously equipped to handle comments.

In short, Benjamin's answer is probably the best way to go. There's no real reason to use Parsec here. But it's also helpful to know why this particular solution didn't work: Parsec's default definition of a language wasn't designed to treat newlines with significance.

1 Comment

Thanks! That was very useful.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.