Parsec parsing in Haskell

Question

I have 2 parsers:

nexpr::Parser (Expr Double)
sexpr::Parser (Expr String)

How do I build a parser that tries one and then the other if it doesn't work? I can't figure out what to return. There must be a clever way to do this.

Thanks.

EDIT:

Adding a bit more info...

I'm learning Haskel, so I started with :

data Expr a where
    N::Double -> Expr Double
    S::String -> Expr String
    Add::Expr Double -> Expr Double -> Expr Double
    Cat::Expr String -> Expr String -> Expr String

then I read about F-algebra (here) and so I changed it to:

data ExprF :: (* -> *) -> * -> * where
    N::Double -> ExprF r Double
    S::String -> ExprF r String
    Add::r Double -> r Double -> ExprF r Double
    Cat::r String -> r String -> ExprF r String

with

type Expr = HFix ExprF

so my parse to:

Parser (Expr Double)

is actually:

Parser (ExprF HFix Double)

Maybe I'm biting off more than I can chew...

Since the return types are different, you probably want to return Parser (Either (Expr Double) (Expr String)). So something like (Left <$> nexpr) <|> (Right <$> sexpr) — user2407038
– user2407038, Commented Sep 1, 2014 at 8:23
I just got this to work - data ExprVal = N (Expr Double) | S (Expr String) and expr = (fmap N nexpr) <|> (fmap S sexpr). Which is best? — b1g3ar5
– b1g3ar5, Commented Sep 1, 2014 at 11:47
@b1g3ar5 In the case of two different types, I would say that Either is better, at least because there are built in library functions for manipulating either. But if you have more than 2 types, using nested Eithers becomes very clunky, so at that point you should define your own datatype. Note that Either (Expr Double) (Expr String) simply hides the underlying type and it would be the same thing to just have an Expr with no type parameter. — user2407038
– user2407038, Commented Sep 1, 2014 at 17:35
@user2407038, Thanks for the clarification. I think I'll stick to my own extra datatype in case I need a B Bool... — b1g3ar5
– b1g3ar5, Commented Sep 1, 2014 at 18:30

Vic Smith · Accepted Answer · 2014-09-03 09:58:40Z

As noted in the comments, you can have a parser like this

nOrSexpr :: Parser (Either (Expr Double) (Expr String))
nOrSexpr = (Left <$> nexpr) <|> (Right <$> sexpr)

However, I think the reason that you are having this difficulty is because you are not representing your parse tree as a single type, which is the more usual thing to do. Something like this:

data Expr = 
      ExprDouble Double 
    | ExprInt Int 
    | ExprString String

That way you can have parsers for each kind of expression that are all of type Parser Expr. This is the same as using Either but more flexible and maintainable. So you might have

doubleParser :: Parser Expr
doubleParser = ...

intParser :: Parser Expr
intParser = ...

stringParser :: Parser Expr
stringParser = ...

exprParser :: Parser Expr
exprParser = intParser <|> doubleParser <|> stringParser

Note that the order of the parsers does matter and use can use Parsec's try function if backtracking is needed.

So, for example, if you want to have a sum expression now, you can add to the data type

data Expr = 
      ExprDouble Double 
    | ExprInt Int 
    | ExprString String
    | ExprSum Expr Expr

and make the parser

sumParser :: Parser Expr
sumParser = do
    a <- exprParser
    string " + "
    b <- exprParser
    return $ ExprSum a b

UPDATE

Well, I take my hat off to you diving straight into GADTs if you are just starting with Haskell. I have been reading through the paper you linked and noticed this immediately in the first paragraph:

The jury is still out on whether the additional type-safety provided by GADTs is worth the added inconvenience of working with them.

There are three points worth taking away here I think. The first is simply that I would have a go with the simpler way of doing things first, to get an idea of how it works and why you might want to add more type safety, before trying to more complicated type theoretical stuff. That comment may not help so feel free to ignore it!

Secondly, and more importantly, your representation...

data ExprF :: (* -> *) -> * -> * where
    N :: Double -> ExprF r Double
    S :: String -> ExprF r String
    Add :: r Double -> r Double -> ExprF r Double
    Cat :: r String -> r String -> ExprF r String

...is specifically designed to not allow ill formed type expressions. Contrasted with mine which can, eg ExprSum (ExprDouble 5.0) (ExprString "test"). So the question you really want to ask is what should actually happen when the parser attempts to parse something like "5.0 + \"test\""? Do you want it to just not parse, or do you want it to return a nice message saying that this expression is the wrong type? Compilers are usually designed in multiple stages for this reason. The first pass turns the input into an abstract syntax tree (AST), and further passes annotate this tree with type judgements. This annotated AST can then be transformed into the semantic representation that you really want it in.

So in your case I would recommend two stages. first, parse into a dumb representation like mine, that will give you the correct tree shape but allow ill-typed expressions. Like

data ExprAST = 
      ExprASTDouble Double 
    | ExprASTInt Int 
    | ExprASTString String
    | ExprASTAdd Expr Expr

Then have another function that will typecheck the ExprAST. Something like

 typecheck :: ExprAST -> Maybe (ExprF HFix a)

(You could also use Either and return either the typechecked GADT or an error string saying what the problem is.) The further problem here is that you don't know what a is statically. The other answer solves this by using type tags and an existential wrapper, which you might find to be the best way to go. I feel like it might be simpler to have a top level expression in your GADT that all expressions must live in, so an entire parse will always have the same type. In the end there is usually only one program type.

My third, and last, point is related to this

The jury is still out on whether the additional type-safety provided by GADTs is worth the added inconvenience of working with them.

The more type safety, generally the more work you have to do to get it. You mention you are new to Haskell, yet this adventure has taken us right to the edge of what it is capable of doing. The type of the parsed expression cannot depend only on the input string in a Haskell function, because it does not allow for dependant types. If you want to go down this path, I might suggest you have a look at a language called Idris. A great introduction to what it is capable of can be found in this video, in which he constructs a typesafe printf.

See my EDIT. So with my ExprF r a the underlying type (a) is not hidden as it is in your Expr. I can hide it in an Either or I can hide it in an ExpVal (see my comment). Thanks.
Thanks for the edit. I think that in this case GADTs do seem quite good at doing some type checking automatically (so that I don't add strings or concat ints). Not sure about the F-algebra, I need to re-read that stuff. Idris looks interesting but a step too far for me.
Yep, and I do agree with this use of GADTs, I'm just trying to recommend an interim step with an untyped, non-GADT, representation before you apply your type checker which will ultimately output the GADT if it typechecks. This is how GHC works for example. The dependant type comments are probably taking things a bit far. Anyway, if I have answered the question satisfactorily, please do accept it :)

willtim · Accepted Answer · 2014-09-02 21:59:36Z

3

The problem described looks to be using Parsec to parse into a GADT representation, for which probably the easiest solution would be parse into a monotype representation and then have a (likely partial) type checking phase to produce the well-typed GADT, if it can. The monotype representation could be an existential wrapper over a GADT term, with a type-tag to reify the GADT index.

EDIT: a quick example

Let's define a type for type-tags and an existential wrapper:

data Type :: * -> * where
  TDouble :: Type Double
  TString :: Type String

data Judgement f = forall ix. Judgement (f ix) (Type ix)

With the example GADT given in the original post, we only have a problem with the outer-most production, which we need to parse to a monotype as we don't know statically which expression type we will get at runtime:

pExpr :: Parser (Judgement Expr)
pExpr =  Judgement <$> pDblExpr <*> pure TDouble
     <|> Judgement <$> pStrExpr <*> pure TString

We can write a type check phase to produce a GADT or fail, depending on whether the type assertion succeeds or not:

typecheck :: Judgement Expr -> Type ix  -> Maybe (Expr ix)
typecheck (Judgement e TDouble) TDouble = Just e
typecheck (Judgement e TString) TString = Just e
typecheck _ _ = Nothing

edited Sep 2, 2014 at 21:59

answered Sep 2, 2014 at 18:59

willtim

312 bronze badges

2 Comments

StormeHawke Over a year ago

An example would greatly improve your answer

SEC Over a year ago

Just wanted to thank you very belatedly for this answer. I have been trying for some time to understand how the use of existentials helps you parse an untyped expression to a GADT. This is where it finally clicked for me.

Collectives™ on Stack Overflow

Parsec parsing in Haskell

2 Answers 2

UPDATE

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

UPDATE

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related