Parse Haskell itself with parser combinators

Question

Given parser combinators as defined by libraries such as Parsec, Attoparsec or various other functional implementations, is it possible to parse languages such as C or Haskell themselves?

Here is an example of what I have in mind:

-- constructor defined by its name, and a list of arguments           
data Constructor = Constructor String [Type]

-- type definition contains a type name, list of type variables, and a list of constructors
data Type = Type String [Char] [Constructor]

In this very simplified example, parsing of a type could be:

typeParser :: Parser Type
typeParser = do
  string "data"
  spaces
  name <- takeWhile letters
  spaces
  typeVars <- many1 letter
  ...

I noticed that the package http://hackage.haskell.org/package/haskell-src-1.0.3.1 parses the Haskell 98 language, but it does not depend on any of the parser combinator libraries.

Yes it would be possible. You may be interested in language-c for parsing C. Both language-c and haskell-src use parser generators for the actual parsing though. When your grammer grows to be large, a parser combinator is useful to help reason about ambiguities. — Alec
– Alec, Commented Apr 3, 2020 at 17:14
Helium is a (simplified) Haskell compiler that uses Parsec for its parser. — K. A. Buhr
– K. A. Buhr, Commented Apr 4, 2020 at 17:38

Ryan Farmer · Accepted Answer · 2023-03-07 19:26:14Z

3

TL;DR Yes, you can parse Haskell using a monadic parser combinator library like Parsec.

Some programming languages like Haskell are not fully context-free. This means that some contextual information is needed in order to parse them. Haskell is not fully context-free because it is indentation-sensitive.

Some monadic parser combinator libraries like Parsec and Megaparsec allow for more easily parsing context-sensitive languages. Parsec'sParsecT and Parsec types can keep track of contextual information, which the library refers to as "user state", which allows for parsing the context-sensitive parts of languages like indentation level. The "user state" can be accessed through the getState, putState, and modifyState functions. The tricky part is mixing parsers that have "user states" of different types.

It is possible to use approaches other than monadic parser combinators, however they are often more limited and/or less straightforward and can require more workarounds to get them working. For example, a parser generator library like Flex/Bison could be used to parse the context-free parts of Haskell, however a workaround would be required to parse the context-sensitive parts because parser generator libraries can only parse context-free languages.

edited Mar 7, 2023 at 19:26

answered Apr 9, 2020 at 21:12

Ryan Farmer

1241 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

luser droog Over a year ago

I wonder if you strictly need to have a monadic implementation to handle context sensitive issues. I was just reading the old 1975 Budge book where he uses the combinators just to compose a data structure of the tree of sets, then you can apply different parsing strategies to the same "grammar" data. ISTM that monads are only really needed when you have strictly pure functions for everything. With a little relaxation, your parsing functions can manipulate a little bit of state and pass it around, right? Or am I missing something?

Collectives™ on Stack Overflow

Parse Haskell itself with parser combinators

1 Answer 1

TL;DR Yes, you can parse Haskell using a monadic parser combinator library like Parsec.

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

TL;DR Yes, you can parse Haskell using a monadic parser combinator library like Parsec.

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related