1

Parsing XML with python using xml.sax, but my code fails to catch Entities. Why doesn't skippedEntity() or resolveEntity() report in the following:

import os
import cStringIO
import xml.sax
from xml.sax.handler import ContentHandler,EntityResolver,DTDHandler

#Class to parse and run test XML files
class TestHandler(ContentHandler,EntityResolver,DTDHandler):

    #SAX handler - Entity resolver
    def resolveEntity(self,publicID,systemID):
        print "TestHandler.resolveEntity: %s  %s" % (publicID,systemID)

    def skippedEntity(self, name):
        print "TestHandler.skippedEntity: %s" % (name)

    def unparsedEntityDecl(self,publicID,systemID,ndata):
        print "TestHandler.unparsedEntityDecl: %s  %s" % (publicID,systemID)

    def startElement(self,name,attrs):
        # name = string.lower(name)
        summary = '' + attrs.get('summary','')
        arg = '' + attrs.get('arg','')
        print 'TestHandler.startElement(), %s : %s (%s)' % (name,summary,arg)


def run(xml_string):
    try:
        parser = xml.sax.make_parser()
        stream = cStringIO.StringIO(xml_string)

        curHandler = TestHandler()
        parser.setContentHandler(curHandler)
        parser.setDTDHandler( curHandler )
        parser.setEntityResolver( curHandler )

        parser.parse(stream)
        stream.close()
    except (xml.sax.SAXParseException), e:
        print "*** PARSER error: %s" % e;

def main():
    try:
        XML = "<!DOCTYPE page[ <!ENTITY num 'foo'> ]><test summary='step: &num;'>Entity: &not;</test>"
        run(XML)
    except Exception, e:
      print 'FATAL ERROR: %s' % (str(e))

if __name__== '__main__':
    main()

When run, all I see is:

 TestHandler.startElement(), step: foo ()
 *** PARSER error: <unknown>:1:36: undefined entity

Why don't I see the resolveEntity print for &num; or the skipped entry print for &not;?

2 Answers 2

2

I think resolveEntity and skippedEntity are only called for external DTDs. I got this to work by modifying the XML.

XML = """<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE test SYSTEM "external.dtd" >
<test summary='step: &foo; &bar;'>Entity: &not;</test>
"""

The external.dtd contains two simple entity declarations.

<!ENTITY foo "bar">
<!ENTITY bar "foo">

Also, I got rid of resolveEntity.

This outputs -

TestHandler.startElement(), test : step: bar foo ()
TestHandler.skippedEntity: not

Hope this helps.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, I didn't get that the DTD had to be external.
1

Here is a modified version of your program that I hope makes sense. It demonstrates a case where all TestHandler methods are called.

import StringIO
import xml.sax
from xml.sax.handler import ContentHandler

# Inheriting from EntityResolver and DTDHandler is not necessary
class TestHandler(ContentHandler):

    # This method is only called for external entities. Must return a value. 
    def resolveEntity(self, publicID, systemID):
        print "TestHandler.resolveEntity(): %s %s" % (publicID, systemID)
        return systemID

    def skippedEntity(self, name):
        print "TestHandler.skippedEntity(): %s" % (name)

    def unparsedEntityDecl(self, name, publicID, systemID, ndata):
        print "TestHandler.unparsedEntityDecl(): %s %s" % (publicID, systemID)

    def startElement(self, name, attrs):
        summary = attrs.get('summary', '')
        print 'TestHandler.startElement():', summary

def main(xml_string):
    try:
        parser = xml.sax.make_parser()
        curHandler = TestHandler()
        parser.setContentHandler(curHandler)
        parser.setEntityResolver(curHandler)
        parser.setDTDHandler(curHandler)

        stream = StringIO.StringIO(xml_string)
        parser.parse(stream)
        stream.close()
    except xml.sax.SAXParseException, e:
        print "*** PARSER error: %s" % e

XML = """<!DOCTYPE test SYSTEM "test.dtd">
<test summary='step: &num;'>Entity: &not;</test>
"""

main(XML)

test.dtd contains:

<!ENTITY num "FOO">
<!ENTITY pic SYSTEM 'bar.gif' NDATA gif>

Output:

TestHandler.resolveEntity(): None test.dtd
TestHandler.unparsedEntityDecl(): None bar.gif
TestHandler.startElement(): step: FOO
TestHandler.skippedEntity(): not

Addition

As far as I can tell, skippedEntity is called only when an external DTD is used (at least I can't come up with a counterexample; it would be nice if the the documentation was a little clearer).

Adam said in his answer that resolveEntity is called only for external DTDs. But that is not quite true. resolveEntity is also called when processing a reference to an external entity that is declared in an internal or external DTD subset. For example:

<!DOCTYPE test [
<!ENTITY num SYSTEM "bar.txt">
]>

where the content of bar.txt could be, say, FOO. In this case it is not possible to refer to the entity in an attribute value.

1 Comment

Thanks. Would there be a way to get skippedEntity to get called if there wasn't an external DTD?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.