4

I need to parse a file containing xml comments. Specifically it's a c# file using the MS /// convention.

From this I'd need to pull out foobar, or /// foobar would be acceptable, too. (Note - this still doesn't work if you make the xml all on one line...)

testStr = """
    ///<summary>
    /// foobar
    ///</summary>
    """

Here is what I have:

import pyparsing as pp

_eol = pp.Literal("\n").suppress()
_cPoundOpenXmlComment = Suppress('///<summary>') + pp.SkipTo(_eol)
_cPoundCloseXmlComment = Suppress('///</summary>') + pp.SkipTo(_eol)
_xmlCommentTxt = ~_cPoundCloseXmlComment + pp.SkipTo(_eol)
xmlComment = _cPoundOpenXmlComment + pp.OneOrMore(_xmlCommentTxt) + _cPoundCloseXmlComment

match = xmlComment.scanString(testStr)

and to output:

for item,start,stop in match:
    for entry in item:
        print(entry)

But I haven't had much success with the grammer working across multi-line.

(note - I tested the above sample in python 3.2; it works but (per my issue) does not print any values)

Thanks!

3 Answers 3

3

I think Literal('\n') is your problem. You don't want to build a Literal with whitespace characters (since Literals by default skip over whitespace before trying to match). Try using LineEnd() instead.

EDIT 1: Just because you get an infinite loop with LineEnd doesn't mean that Literal('\n') is any better. Try adding .setDebug() on the end of your _eol definition, and you'll see that it never matches anything.

Instead of trying to define the body of your comment as "one or more lines that are not a closing line, but get everything up to the end-of-line", what if you just do:

xmlComment = _cPoundOpenXmlComment + pp.SkipTo(_cPoundCloseXmlComment) + _cPoundCloseXmlComment 

(The reason you were getting an infinite loop with LineEnd() was that you were essentially doing OneOrMore(SkipTo(LineEnd())), but never consuming the LineEnd(), so the OneOrMore just kept matching and matching and matching, parsing and returning an empty string since the parsing position was at the end of line.)

Sign up to request clarification or add additional context in comments.

2 Comments

thanks for the suggestion; however changing to _eol=pp.LineEnd().suppress() results in a hang/inf loop. Could you be a litte more specific (Note - paste the 3 sections together in one .py file and the code runs as-is). Thanks,Mike
vote up for the explanation of what is wrong. Duh! I should have seen that I never consumed the end of line :)
2

How about using nestedExpr:

import pyparsing as pp

text = '''\
///<summary>
/// foobar
///</summary>
blah blah
///<summary> /// bar ///</summary>
///<summary>  ///<summary> /// baz  ///</summary> ///</summary>    
'''

comment=pp.nestedExpr("///<summary>","///</summary>")
for match in comment.searchString(text):
    print(match)
    # [['///', 'foobar']]
    # [['///', 'bar']]
    # [[['///', 'baz']]]

1 Comment

@PaulMcGuire's solution would work, too, but this is exactly what I should be using (it's the simplest...) Thansk!
1

You could use an xml parser to parse xml. It should be easy to extract relevant comment lines:

import re
from xml.etree import cElementTree as etree

# extract all /// lines
lines = re.findall(r'^\s*///(.*)', text, re.MULTILINE)

# parse xml
root = etree.fromstring('<root>%s</root>' % ''.join(lines))
print root.findtext('summary')
# -> foobar

3 Comments

I thought you were great in Blade Runner.
@JFSebastian Unfortunately this wouldn't work in the bigger picture I'm encountering this problem in. yes, I could extract all the xml fragments as you suggest, but I need to also parse source code after the comment, and a grammer is ~necessary for that; doing the regex search line by line would add an additional loop through the file.
@mike: the regex is just an example how to extract comment lines. In the bigger picture you use your parser to extract relevant comments (much simpler task than parsing xml) and it doesn't prevent you from using xml parser to parse xml if you find it necessary.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.