Parser Combinators for Trees instead of Strings

Question

Suppose I need a parser to handle tree-like input (e.g. scala.xml.Elem) instead of strings. I would like to use the parser combinators from this article. If I linearize the input tree I can write such a parser easily.

type Parser[A] = seq: Seq[Elem] => List[(A,Seq[Elem])]

I can add parsers return, failure, item, etc. and finally write my parsers on top of them.

Now I wonder if I can make a parser without linearizing the input tree. Is it possible ?

Can you share why you want to avoid linearizing the input tree? You can always create a linear view of a tree structure. Also, parsing is commonly used to convert a linear structure into a tree!...what do you want out of your parsing that isn't already present in the tree structure? — Rex Kerr
– Rex Kerr, Commented Aug 25, 2014 at 19:55
@RexKerr I would like to check an XML structure: e.g. suppose I have functions that check if the input is an element with a certain given label. Now I would like to compose those functions to check if the input element is <a><a1/><a2/></a>. — Michael
– Michael, Commented Aug 25, 2014 at 20:29
@RexKerr Linearizing is Ok. I am just checking alternatives. — Michael
– Michael, Commented Aug 25, 2014 at 20:32
You might be more lucky if you search for "tree matching" instead of parsing. A quick Google search for "tree matching" revealed quite a few papers and articles. I don't know how much tree matching is related to pattern matching, though. It's probably some special case? — Ionuț G. Stan
– Ionuț G. Stan, Commented Aug 26, 2014 at 12:33
@IonuțG.Stan Thanks. I will search for "tree matching". I do not know how it is related to pattern matching though. — Michael
– Michael, Commented Aug 26, 2014 at 13:26

Matt Fenwick · Accepted Answer · 2014-08-26 13:19:27Z

3

Great question. It is absolutely possible to do this, and I've been looking for a tool that does it for a while.

I think the key is that the structure of your tree will be reflected in your primitive combinators. For instance, the primitive item parser is tied to the [] container type, and provides the ability to sequentially walk down a list by first/resting it. return and failure do not depend on the [] container type, so they shouldn't have to be changed to support tree parsing.

You'll need to replace it with one or more combinators that allow you to traverse a tree. I would guess that you'd want one combinator to let you move between siblings (i.e. children of the same parent node, at the same depth), and a second combinator to allow you to go deeper into the tree.

What I'm not sure about is whether you'll need duplicate combinators to correspond to capture the patterns of sequencing, alternation, lookahead, and so on. Having to implement each of those twice could get pretty obnoxious.

answered Aug 26, 2014 at 13:19

Matt Fenwick

49.3k24 gold badges130 silver badges198 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Michael Over a year ago

Thank you ! Two combinators -- nextbro to move to the next sibling and oldson to move to the "oldest" child -- are what I am thinking of.

Matt Fenwick Over a year ago

@Michael exactly! I hope to get some time to implement this soon; if so, will update accordingly.

Michael Over a year ago

:) Maybe I will do it too. In Scala.

Collectives™ on Stack Overflow

Parser Combinators for Trees instead of Strings

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related