0

I am trying to get the text between the xml tags. There are several posts about it, but what I don't understand is how to save it in a variable. The code below prints what I want, but as soon as I replace "print" with "return" it doesn't save this text in the variable. I think I am missing something very simple here.

from xml.sax import make_parser, handler

line = '<text><p><s id="1">Some text <someothertag>some more text</someothertag></s></p></text>'
class extract_text(handler.ContentHandler):
    def characters(self, data):
        print data.strip()

parser = make_parser()
parser.setContentHandler(extract_text())
parser.feed(line)

So I would like to have a variable, which would be equal to "Some text some more text" Any idea is very welcome!

1 Answer 1

1

If you just return value from the handler it will not be stored anywhere. You need to do it yourself:

result = ''

class extract_text(handler.ContentHandler):
    def characters(self, data):
        global result
        result += data.strip() + '\n'

parser = make_parser()
parser.setContentHandler(extract_text())
parser.feed(line)

print(result)
Sign up to request clarification or add additional context in comments.

2 Comments

When I put this function in another one I get an error "global name 'result' is not defined". And when it is outside of this other function, each string is concatenated with the previous one. So at the end it is one big string. Do you have an idea, what I should adapt? I have only found the advises to use "self", but it is already there. Thank you!
If you have some other problem you need to ask another question do describe it in details that is provide the code, errors etc. Sorry, I can't understand what you are doing and what is not working.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.