0
#!/usr/bin/env python
import os, sys, os.path
import string 

def sort_strings_file(xmlfile,typee):
    """sort all strings within given strings.xml file"""

    all_strings = {}
    orig_type=typee

    # read original file
    tree = ET.ElementTree()
    tree.parse(xmlfile)

    # iter over all strings, stick them into dictionary
    for element in list(tree.getroot()):
        all_strings[element.attrib['name']] = element.text

    # create new root element and add all strings sorted below
    newroot = ET.Element("resources")
    for key in sorted(all_strings.keys()):
        # Check for IDs
        if typee == "id":
            typee="item"

        # set main node type
        newstring = ET.SubElement(newroot, typee)

        #add id attrib
        if orig_type == "id":
            newstring.attrib['type']="id"

        # continue on
        newstring.attrib['name'] = key
        newstring.text = all_strings[key]


    # write new root element back to xml file
    newtree = ET.ElementTree(newroot)
    newtree.write(xmlfile, encoding="UTF-8")

This works great and all, but if a string start with like <b> it breaks badly. EX

<string name="uploading_to"><b>%s</b> Odovzdávanie do</string>

becomes

<string name="uploading_to" />

I've looked into the xml.etree Element class, but it seems to only have .text method. I just need a way to pull everything in between xml tags. No, I can't change the input data. It comes directly from an Android APK ready to be translated, I cannot predict how / what the data comes in besides the fact that it must be valid XML Android code.

7
  • You seem to have mixed tabs and spaces. Now the indentation in the post is well out-of-whack, could you please clean that up? Commented Mar 6, 2013 at 21:24
  • Do you expect .text to *include*` the <b> tag, or it's contents? Commented Mar 6, 2013 at 21:25
  • I can't get StackOverflow to work correctly with Python. I post the code w/ correct indentation and it screws it up. I'm not messing around with it. Commented Mar 6, 2013 at 22:02
  • That is because you are mixing tabs and spaces. Tabs display at a width of 4 spaces once posted. Use your text editor to replace tabs with 8 spaces instead, then copy and paste. Commented Mar 6, 2013 at 22:05
  • I've fixed the tabs in this posting for you. Commented Mar 6, 2013 at 22:06

1 Answer 1

1

I think you are looking for the itertext() method instead. .text only returns text directly contained at the start of the element:

>>> test = ET.fromstring('<elem>Sometext <subelem>more text</subelem> rest</elem>')
>>> test.text
'Sometext '
>>> ''.join(test.itertext())
'Sometext more text rest'

The .itertext() iterator on the other hand let's you find all text contained in the element, including inside nested elements.

If, however, you only want text directly contained in an element, skipping the contained children, you want the combination of .text and the .tail values of each of the children:

>>> (test.text or '') + ''.join(child.tail for child in test.getchildren())
'Sometext  middle  rest'

If you need to capture everything contained, then you need to do a little more work; capture the .text, and cast each child to text with ElementTree.tostring():

>>> (test.text or '') + ''.join(ET.tostring(child) for child in test.getchildren())
'Sometext <subelem>more text</subelem> middle <subelem>other text</subelem> rest'

ET.tostring() takes the element tail into account. I use (test.text or '') because the .text attribute can be None as well.

You can capture that last method in a function:

def innerxml(elem):
    return (elem.text or '') + ''.join(ET.tostring(child) for child in elem.getchildren())
Sign up to request clarification or add additional context in comments.

3 Comments

In that example, if I want to retain the <subelem>, how would I do that? I can't ignore the nested elements, so I guess using the term text is wrong. I need everything literal in between the main element tags.
@ConnorTumbleson: Updated to include more examples and a way to capture the 'innerHTML' version.
I added a little check to make sure it had children, and it worked great. Thanks for your help and dealing with me not able to put code on SO.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.