0

I am just getting started with Python/Jython and the SAX parser (xml.sax). I wrote a simple content handler as a test.

from __future__ import with_statement 

from xml.sax import make_parser, handler
from xml.sax.handler import ContentHandler

class CountingHandler(ContentHandler):

    def __init__(self):
        self.counter = 0

    def startElement(self, name, attrs):
        self.counter += 1

def main(argv=sys.argv):
    parser = make_parser()
    h = CountingHandler()
    parser.setContentHandler(h)
    with open(argv[1], "r") as input:
        parser.parse(input)

When I run this on some documents (not all), I get an error:

Traceback (most recent call last):
  File "src/sciencenetworks/xmltools.py", line 93, in <module>
    sys.exit(main())
  File "src/sciencenetworks/xmltools.py", line 88, in main
    parser.parse(input)
  File "/amd.home/home/staudt/workspace/jython/Lib/xml/sax/drivers2/drv_javasax.py", line 141, in parse
    self._parser.parse(JyInputSourceWrapper(source))
  File "/amd.home/home/staudt/workspace/jython/Lib/xml/sax/drivers2/drv_javasax.py", line 90, in resolveEntity
    return JyInputSourceWrapper(self._resolver.resolveEntity(pubId, sysId))
  File "/amd.home/home/staudt/workspace/jython/Lib/xml/sax/drivers2/drv_javasax.py", line 75, in __init__
    if source.getByteStream():
AttributeError: 'unicode' object has no attribute 'getByteStream'

When I look into the source code of drv_javasax.py, it seems like input is not recognized as a file like object, which it is.
Any ideas on how to fix this?

2 Answers 2

1

I think it's this bug: http://bugs.jython.com/issue1488. Fixed in Jython 2.5.2-b1: http://www.jython.org/latest.html

Sign up to request clarification or add additional context in comments.

2 Comments

I am running Jython 2.5.1 (Release_2_5_1:6813, Sep 26 2009, 13:47:54), I'll try the latest version.
I guess that was the solution to my problem. Thanks. But now I have a new one: xml.sax._exceptions.SAXParseException: <unknown>:1:1: The parser has encountered more than "64,000" entity expansions in this document; this is the limit imposed by the application
0

When you insert print type(input) after your with statement, what do you see?

When you revert to old-style "try/finally" code instead of "with", does it work for all files?

What is different between files that work and files that don't work?

What happens if you change the name input to something that doesn't shadow a built-in function?

1 Comment

a) Input is an open file, I've checked that already. b) Without the with-statement, the error remains. c) I can't say what the difference is. The one with the error contains a reference to a .dtd file, the other one does not. d) I've tried parser.parse(open("file.xml","r")), doesn't change anything.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.